Mathematical World * Volume 1 


V. M. Tikhomirov 


American Mathematical Society 
Mathematical Association of America 


Mathematical World « Volume 1 


Stories about 
Maxima 
and Minima 


V. M. Tikhomirov 
Translated from the Russian by 


Abe Shenitzer 


American Mathematical Society 
Mathematical Association of America 


B. M. FAXOMHUPOB 


PACCKA3bI 
O MAKCHMYMAX 
WM MHAHYUMYMAX 


«HAYKA», MOCKBA, 1986 
Translated from the Russian by Abe Shenitzer 


1991 Mathematics Subject Classification. Primary 00A07, 
00A30, 00A35, O1-O1, 46-01, 49-01, 49-03, 49J99 


Library of Congress Cataloging-in-Publication Data 


Tikhomirov, Viadimir M (Vladimir Mikhailovich), 1934- 

Stories about maxima and minima/V M_ Tikhomirov 

p cm (Mathematical world. ISSN 1055-9426, 1) 

ISBN 0-8218-0165-! 

1 Maxima and minima 2 Calculus of variations 3 Mathematical optimization 
QA306 T55 1990 90-21 246 
511°66 dce20 CIP 


Copying and reprinting. Individual readers of this publication, and nonprofit libraries acting for them, 
are permitted to make fair use of the material, such as to copy a chapter for use in teaching or research 
Permission is granted to quote brief passages from this publication in reviews, provided the customary 
acknowledgment of the source is given 

Republication, systematic copying, or multiple reproduction of any material in this publication (in- 
cluding abstracts) is permitted only under license from the American Mathematica! Society Requests for 
such permission should be addressed to the Assistant to the Publisher, American Mathematical Society, 
PO Box 6248, Providence, Rhode Island 02940-6248 Requests can also be made by e-mail to reprint- 
permissionQ@math.ams.org 


© Copyright 1990 by the American Mathematical Society All rights reserved 
Printed in the United States of America 


The American Mathematical Society retains all rights 
except those granted to the United States Government 
© The paper used in this book is acid-free and falls within the guidelines 
established to ensure permanence and durability 
€ Printed on recycled paper 


1098765432 00 99 98 97 96 95 


To the Memory 
of My Dear Friend, 
V. M. Alekseev 


Table of Contents 


TALCOAUCTION: 38 Aeon So ete ae Chel he ee RR Bo REP 


Part One. Ancient Maximum and Minimum Problems 


The first story Why Do We Solve Maximum and Minimum 
Problems?) fs 0% 2.44 weedeat cis a 
The second story The Oldest Problem—Dido’s Problem .... 
The third story Maxima and Minima in Nature (Optics) . . . 
The fourth story Maxima and Minima in Geometry ...... 
The fifth story Maxima and Minima in Algebra and in Analysis 
The sixth story Kepler’s Problem ................. 
The seventh story The Brachistochrone............... 
The eighth story Newton’s Aerodynamical Problem....... 


Part Two. Methods of Solution of Extremal Problems 


The ninth story What is a Function? ............... 
The tenth story What is an Extremal Problem?......... 


The eleventh story Extrema of Functions of One Variable .... 


ix 


93 


99 


viii 


The twelfth story 


The thirteenth story 


The fourteenth story 


The last story 


Bibliography ..... 


CONTENTS 


Extrema of Functions of Many Variables. 
The Lagrange Principle ............. 109 


More Problem Solving.............. 119 


What Happened Later in the Theory of Extremal 


Problems? °s4. 3-34} Gad bas adds ae SS 143 
More Accurately, a Discussion......... 179 
Shi mb Ws nga as cu geste dy “Petcctt eth Bogan en. eomsy ramteaee agree de 187 


Introduction 


In daily life it is constantly necessary to choose the best possible (optimal) 
solution. A tremendous number of such problems arise in economics and in 
technology. In such cases it is frequently useful to resort to mathematics. 

In mathematics, the study of maximum and minimum problems began 
a very long time ago, in fact, twenty-five centuries ago. For a long time 
there were no uniform ways of tackling problems for finding extrema. The 
first general methods of investigation and solution of extremal problems were 
created about 300 years ago, at the time of the formation of mathematical 
analysis. 

Then it became clear that certain special optimization problems play a 
crucial role in the natural sciences. Specifically, it was found that many laws 
of nature can be derived from so-called “variational principles.” According 
to these principles, given any collection of admissible motions, what distin- 
guishes the actual motion of a mechanical system, or of light, electricity, a 
fluid, a gas, and so on, is that it maximizes or minimizes certain quantities. 
Some concrete extremal problems, whose content derives from the natural 
sciences (the brachistochrone problem, Newton’s problem, and others), were 
posed at the end of the seventeenth century. The need to solve these, as 
well as many other problems of geometry, mechanics, and physics, led to the 
creation of a new branch of mathematical analysis that came to be known as 
the calculus of variations. 

The intensive development of the calculus of variations continued for 
about two centuries. Many of the finest scientists of the eighteenth and 
nineteenth centuries took part in this process, and, by the beginning of this 
century, it seemed as if they had exhausted the topic. 

But it turned out that this was not the case. The needs of practical life. 
especially in economics and technology, gave rise to new problems that could 
not be solved by the old methods. One had to advance. It was necessary 
to create a new field of mathematical analysis, known as “convex analysis,” 
involving the study of convex functions and convex extremal problems. 


x INTRODUCTION 


The needs of technology, and in particular the exploration of space, gave 
rise to yet another series of problems that were likewise unsolvable by the 
methods of the calculus of variations. Thus, another new theory, known as 
optimal control theory, was created. The fundamental method of optimal 
control theory was worked out in the 1950s and 1960s by Soviet mathemati- 
cians, namely L. S. Pontryagin and his colleagues. This provided a new and 
powerful impulse for further investigations in the theory of extremal prob- 
lems. 

This book aims to acquaint the reader with this whole circle of ideas. How- 
ever, this is not the author’s only purpose. Throughout the history of mathe- 
matics, maximum and minimum problems have played an important role in 
its evolution. During this time many beautiful, important, brilliant, and in- 
teresting problems in geometry, algebra, physics, and so on, have appeared. 
The greatest scientists of the past—Euclid, Archimedes, Heron, Tartaglia, 
Johann and Jakob Bernoulli, Newton, and many others—took part in the so- 
lution of these concrete problems. The solutions stimulated the development 
of the theory and, as a result, techniques were elaborated that made possible 
the solution of a tremendous variety of problems by a single method. 

The author would like the reader to understand how and why a mathemat- 
ical theory is born. In Part One, the reader will get to know many concrete 
problems, and in the course of the discussion of their solutions he will come 
in contact with the creative work of some of the best mathematicians of the 
past. This is not only of historical interest. For the most part, the ideas and 
methods created by eminent mathematicians in connection with the solution 
of problems do not die and are certain to be reborn, given enough time. 
That is why to fathom the conceptions of great men is always an enriching 
experience. 

The need to solve a large number of varied problems establishes the pre- 
conditions for the creation of a general theory. In Part Two I will introduce 
a method for solving maximum and minimum problems that originated with 
Lagrange. The basic conception of this method has endured for over two 
centuries. Its content has varied constantly, but its key thought has remained 
unchanged. It is not a simple matter to understand the reasons for this uni- 
versality of Lagrange’s idea. On the other hand, it is not at all difficult to 
learn to use Lagrange’s principle for the solution of problems. At the end of 
Part Two all problems discussed in Part One, problems marked by the dis- 
similarity of their solutions, are investigated and solved by means of a single 
general method, in a standard way, using one and the same scheme. 

The author has tried to show how the analysis of diverse facts gives rise 
to a general idea, how this idea is transformed, how it is enriched by new 
content, and how it remains the same under all changes. 

With the exception of the concluding part of the fourteenth story, this 
book is primarily aimed at high school students. But I would very much 
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like its readers to include college students interested in mathematics and, of 
course, teachers. The last story is addressed above all to them. It impinges 
on the question of how and why to teach. I think that the content of the book 
supplies material that is ideally suited for a discussion of this topic, a topic 
that is bound to concern us for many years to come. Thus, I hope that this 
book will also be read by my colleagues who study mathematics and teach it 
to their students. 

I wish to thank all those who read the manuscript and commented on it. 
This refers, above all, to Andrei Nikolaevi¢ Kolmogorov, Nikolai Borisovit 
Vasil'ev, Ivan Penkov, and Georgii Georgevi¢ Magaril-Il’ yaev. 

I am grateful to Prof. E. Barbeau for a number of valuable remarks that 
have been included in the English translation of my book. I also wish to 
express my deep appreciation to Prof. A. Shenitzer for his work as translator. 


V. M. Tikhomirov 


PART ONE 


Ancient Maximum 
and Minimum Problems 


Mathematics...possesses not only truth, but 
supreme beauty...such as only the greatest 
art can show. 


B. Russell 


The most fascinating pursuit is to follow the 
thoughts of a great man. 


A. §. Pushkin 


The First Story 


] 


Why Do We Solve 
Maximum and Minimum Problems? 


Nothing takes place in the world whose meaning is 
not that of some maximum or minimum. 


L. Euler 


Most practical questions can be reduced to problems 
of largest and smallest magnitudes...and it is only by 
solving these problems that we can satisfy the require- 
ments of practice which always seeks the best, the 
most convenient. 


P. L. CebySev 


..one wants to reach the very essence. 
B. L. Pasternak 


We learn about maxima and minima in school. One ancient problem that 
you may have solved in your geometry lessons is the following: 

A and B are two given points on the same side of a line 1. Find a point 
D on I such that the sum of the distances from A to D and from D to B 
is a minimum (Figure 1.1 on page 4). 

Here it is necessary to find a least value. that is, a minimum. In many 
problems it is necessary to find a maximum, the largest value of 
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VTA =(O.u) 


9B =(d.h) 


FiGureE 1.1 


something. Both notions—maximum and minimum—are subsumed under 
the Latin term extremum. Problems that involve finding maxima and minima 
are called extremal problems. (The term optimization problems has almost 
the same meaning.) The methods of solution and investigation for the vari- 
ous extremal problems constitute distinct chapters of mathematical analysis. 
Together, these methods make up the part of analysis called the theory of 
extremal problems. 

Our aim in this book is to consider two questions: Why do we solve max- 
imum and minimum problems? What are the components of the theory of 
extremal problems? 

Earlier we posed a geometric problem. This problem can be found in 
almost all geometry textbooks. When and why did this problem first appear? 

The presumed author is the famous ancient mathematician Heron of Alex- 
andria. (In this text we will call the problem Heron’s problem.) We all know 
about Heron through the formula for the area of a triangle that bears his 
name. The book containing this problem is titled, On mirrors. Scholars 
disagree as to when this book was written, but most believe that it was written 
in the first century A.D. Although Heron’s book has disappeared, we know 
about it from later commentaries. 

I assume that the reader knows about Heron’s problem and has solved it. 
I stated it because it will be very useful for illustrating various points. 

Let’s recall the solution of Heron’s problem. 

Let B, be the point symmetric to B with respect to the line /. Join A 
to B,. The required point D is the point of intersection of AB, and | (see 
Figure 1.1). Indeed, if D’ isa point other than D, then 


(1) |AD'| + |D'B| = |AD'| + |D'B,| > |AB,| = |AD| + |DBI. 


Here and in the sequel [AB] denotes the segment joining the points A and 
B, |AB| denotes the length of [AB] and AB||CD indicates that the lines 
AB and CD are parallel. 

In establishing (1) we made use of symmetry properties that imply the 
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equalities |DB| = |DB,|, |D' B| = |D'B,| , and the triangle inequality |AD‘|+ 
|D'B,| > |AB,|. This completes the solution of the problem. 

We note that the required point D has the property that the angle a is 
equal to the angle #. (See Figure 1.1.) Also, the angle g, is equal to the 
angle 9g, , or, as is usually said, the angle of incidence is equal to the angle of 
reflection. 

Using the idea in the argument just presented, try to solve the following 
problems. 


PROBLEM 1. Let C bea given point in the interior of a given angle. Find 
points A and B onthe sides of the angle such that the perimeter of the triangle 
ABC is a minimum. 


PROBLEM 2. Given an angle and two points C and D in its interior, find 
points A and B on the sides of the angle such that |CA|+|AB|+|BD| isa 
minimum. 


Let’s return to Heron’s problem. In his book Heron investigates the laws of 
reflection of light and applies his conclusions to problems related to properties 
of mirrors. In particular, he proves that a parabolic mirror brings to a focus 
the pencil of rays parallel to the mirror’s axis. 

In Heron’s time scholars tried to comprehend the laws of nature by spec- 
ulation and logical arguments, without recourse to experiment. Later in this 
book we will have occasion to talk of the rise of modern experimental science. 
The first great experimenter in the history of science was Galileo Galilei, who 
lived in the seventeenth century. In contrast to Galileo, Heron tried to base 
his explanations of the laws of reflection on logical foundations. He seems to 
have assumed that nature pursues the shortest path. Damianus (sixth century 
A.D.), a commentator on Heron, says that 


Heron.,..showed that lines inclined at equal angles are the 
smallest of all intermediate ones inclined on the same side 
of a single line. Proving this, he says that if nature Coes not 
want a ray of light to meander to no purpose, then it breaks 
it at equal angles. 


Historians of science see in this the first hint of the thought that nature is 
guided by extremal principles. Heron’s idea was developed further by Fermat 
(we will have more to say about this in our third story). Fermat deduced the 
law of refraction of light (established earlier experimentally by Snel) from the 
assumption that what characterizes the trajectory of a light ray moving from 
one point to another in a nonhomogeneous medium is that it is traversed in 
a minimum of time. From that point on, the idea of the extremal character 
of natural phenomena became the guiding light of science. This is confirmed 
by the words of Euler that we chose as an epigraph for this story. 

I will postpone the discussion of the remarkable character of this 
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phenomenon—after all, we cannot think of nature as having a purpose. Nev- 
ertheless, what distinguishes the trajectories of light and radio waves, the 
motions of pendulums and planets, the flows of liquids and gases, as well 
as many other motions, is that they are all solutions of problems of maxima 
and minima. This fact provides a fruitful means of creating a mathematical 
description of nature. 

This, then, is the main reason that impels us to solve problems of maxima 
and minima and to develop the theory of extremal problems. It gave rise in 
the eighteenth century to a special part of this theory called the calculus of 
variations. 

Another reason for studying these problems is found within ourselves. Hu- 
man beings constantly strive to better themselves, which is why they always 
want to choose the best of existing possibilities. In this endeavor, mathemat- 
ics can sometimes be of help. 

Let’s discuss this again using Heron’s problem as a relevant example. Some 
textbooks state it as a practical problem. The line / is turned into a rectilinear 
section of railroad track, points A and B become towns, point D is called 
a railroad platform, and the question is: where should one build the platform 
so that the combined length of the rectilinear highways linking it to the towns 
is minimal? 

What follows are some additional geometric problems of possible practical 
value. Try to think them through yourself. 


PROBLEM 3. Let A, B, and C be three towns. Find D such that the 
combined length of the rectilinear highways linking it to A,B, and C is 
minimal. 


PROBLEM 4. Solve Problem 3 for four towns. 


PROBLEM 5. Will the answer to Problem 4 change if we ask for the minimal 
length of highway linking the four towns without specifying that the highway 
links must come together in one point? 


It is clear that such problems are just models of actual situations. In 
reality, all is far more complicated: sections of railroad tracks are not recti- 
linear, highways are not built to follow strictly straight lines, and a “sum of 
distances” alone is seldom an “optimality criterion.” But there is no doubt 
that in building railroad tracks, highways or other roads, gas and oil pipelines, 
and in many other situations, the usual question is how to accomplish the 
task most expediently—say, at least cost. 

Such problems arise constantly in economic activities. Invariably, an ob- 
jective must be attained in the cheapest, fastest, shortest, or most economical 
manner. 

Let’s look at an optimization problem in an economic context. Suppose 
we have supply centers of a certain product, stores, and a truck depot. How 
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should the truck depot dispatcher organize the supplying of the necessary 
product to the stores for maximum economy? (Problems of this type are 
called transportation problems. We will formulate them more precisely later.) 
To solve such problems, it is necessary to turn to mathematics. 

The methods for solving maximum and minimum problems developed 
through the middle of the present century proved inadequate for the solution 
of problems similar to our transportation example. One of the things that 
came to light is that in many economic problems the notion of convexity plays 
a key role. Since one often encounters convex as well as linear functions and 
sets in this area, the need arose for a comprehensive theory of convex sets 
and functions now known as convex analysis. This development gave rise to 
new directions in the theory of extremal problems called linear and convex 
programming. They were initiated in the 1930s by the Soviet mathematician 
L. V. Kantorovié. 

Most optimization problems deal with technological processes, tools, and 
systems. Here is a relevant example. Consider a cart moving rectilinearly 
and without friction on horizontal rails. The cart is controlled by an external 
force that can be varied within prescribed bounds. The cart is to be stopped at 
a definite location in the shortest possible time. This problem exemplifies the 
simplest problem of rapid response under automatic control. It is an instance 
of the multitude of problems that have arisen in the chemical industry, in 
space travel, and in other technological areas that could not be handled by the 
methods of the calculus of variations. Thus, it became necessary to create a 
new field to supplement the calculus of variations. This new field was called 
optimal control. 

All this points to yet another reason for the solution of optimization prob- 
lems and the development of the theory of extremal problems. To quote 
CebySev, it is the wish “to satisfy the requirements of practice.” But these 
reasons do not explain the whole mystery. 

The next story deals with the oldest maximum and minimum problem, 
namely the classical isoperimetric problem. Some twenty-five centuries ago, 
in ancient Greece, it was discovered that of all closed curves of a given length, 
the circle has the remarkable property of enclosing the largest area. In school 
you probably encountered problems describing analogous properties of poly- 
gons. Let’s recall two such exercises. 


PROBLEM 6. Find a triangle of given perimeter that has maximal area. 


PROBLEM 7. Show that of all rectangles of given perimeter, the square has 
the largest area. 


A problem equivalent to Problem 7 is dealt with already in Euclid’s El- 
ements. Also, Fermat used the solution of this very problem to illustrate 
his method of finding maxima and minima, a method known as Fermat’s 
theorem. 
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Why were such problems posed and solved? What is the secret of their 
attraction? Why do the authors of most geometry books like to deal with 
problems of maxima and minima? 

These questions are not easily answered. The fact remains, however, that 
throughout the history of mathematics, extremal problems have elicited in- 
terest and a desire to solve them. It is conceivable that what is behind this is 
our natural quest for perfection, some secret tendency to comprehend “the 
very essence.” Perhaps it is that most—if not all—extremal problems contain 
an element of grace, of attractiveness, of the beauty of which Russell speaks, 
and this impels us to solve maximum and minimum problems. 

I have said enough for you to appreciate the importance and interest of 
the subject I have chosen. 

Perhaps it is relevant to indicate the temporal bounds of our stories. The 
earliest maximum and minimum problems were posed in the distant past. 
In fact, the classical isoperimetric problem covered in the next story was 
investigated in the fifth century B.C. And in the fourteenth story we will 
deal with problems arising in our own time. 

For a long time, each extremal problem was solved individually. In the 
seventeenth century, there was a clear awareness of the need to create some 
general methods. Such methods were developed by Fermat, Newton, Leibniz, 
and others; first for one, then for finitely many, and, ultimately, for infinitely 
many variables. These methods led to the formulation of the basic divisions 
of the theory of extremal problems: mathematical programming (that is, 
the theory of finite-dimensional optimization problems), convex (including 
linear) programming (where one studies convex optimization problems), the 
calculus of variations, and the theory of optimal control. 

This book is divided into two parts. The first part consists of ancient 
problems, posed and solved, as a rule, before the invention of the first general 
methods. In the second part we will discuss some of the methods of the theory 
of extremal problems. 

In the first part we will discuss problems connected with the names of 
the greatest mathematicians of various epochs such as Euclid, Archimedes, 
Fermat, Kepler, Huygens, Johann Bernoulli, Newton, and Leibniz. I have not 
denied myself the pleasure “of following the thoughts” of these great men. 

And in the second part.... Of that it is as yet too early to talk. 


The Second Story 
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The Oldest Problem—Dido’s Problem 


They bought as much land—and called it Birsa—as 
could be encircled with a bull’s hide. 


The Aeneid of Vergil 


The most beautiful solid is the sphere, and the most 
beautiful plane figure—the circle. 


Pythagoras 


We took as an epigraph for this story two lines from the Aeneid of Publius 
Vergilius Maro, one of the greatest poets of ancient Rome. Like all immortal 
creations, the Aeneid tells the story of human passions, of good and evil, 
of fate and suffering, of guile and love, of life and death. The quoted lines 
refer to an event that tradition placed in the ninth century B.C. We recall the 
legend reproduced in the Aeneid. 

Fleeing from persecution by her brother, the Phoenician princess Dido set 
off westward along the Mediterranean shore in search of a haven. A certain 
spot on the coast of what is now the bay of Tunis caught her fancy. Dido 
negotiated the sale of land with the local leader, Yarb. She asked for very 
little—as much as could be “encircled with a bull’s hide.” Dido managed to 
persuade Yarb, and a deal was struck. Dido then cut a bull’s hide into narrow 
strips, tied them together, and enclosed a large tract of land. On this land 
she built a fortress and, near it, the city of Carthage. There she was fated to 
experience unrequited love and a martyr’s death. 

This incident suggests the question: How much land can be enclosed by a 
bull’s hide? 
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Why begin with this problem? After all, its solution is rather difficult. It 
would seem reasonable to begin with simpler matters. Still, I choose another 
road. In this part I will move not from simple to complex matters, but 
from the distant past to our own days. That is why I want “to begin at the 
beginning.” Is it not remarkable that such difficult and profound problems 
were posed and solved in those mythical times? Our predecessors knew so 
much less than we do, but they persevered and attained their objective! 

How much land, then, can one enclose by a bull’s hide? To answer this 
question, we must pose it in a mathematically correct manner. A modern 
mathematician would say: 

Among all closed plane curves of a given length, find the one that encloses 
the largest area. 

This question is known as Dido’s problem, or the classical isoperimetric 
problem. (Isoperimetric figures are figures that have the same perimeter.) 

So far we have managed with words alone. A person with a sufficiently 
high level of mathematical culture is completely satisfied with this kind of 
formulation, for he knows what is meant by “curve,” “length,” and “area.” 
It took more than 2000 years to assign precise meanings to these words. To 
properly clarify these terms would require another book, so we will approach 
our problem in the “naive” manner of the ancients (and in the manner dic- 
tated by practical considerations to Princess Dido herself). But we will try 
to do without a bull’s hide. 

We unwind some thread from a spool, cut it, tie the ends together, and 
put the tied thread on a sheet of paper. The result is a plane closed curve. 
If we now cut out the piece of paper along the contour of the thread, then 
we obtain a representation of the area enclosed by this curve. This area can 
be measured. If our sheet of paper is a sheet of millimeter paper, then the 
measurement can be quite accurate. Now the question posed by the problem 
is clear: we are to explain how to place our thread to enclose a maximum 
area. 

I will soon show that the curve that solves the classical isoperimetric prob- 
lem is a circle. In describing Dido’s actions, Vergil used the word “circum- 
dare” (to encircle) containing the root circus (circle). This suggests that Dido 
solved the classical isoperimetric problem correctly. 

Many historians are of the opinion that this was the first extremal problem 
discussed in the scientific literature. In addition to noting the isoperimet- 
ric property of the circle (that is, the property of the circle to enclose the 
largest area among all isoperimetric figures), ancient geometers also noted 
the isoepiphanic property of the sphere (that is, the property of the sphere 
to enclose the largest volume among all figures with the same surface area). 
This property of maximal capacity was the basis of the notion that the circle 
and sphere are the embodiments of geometric perfection (recall the words of 
Pythagoras that serve as an epigraph for this story). 
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Another confirmation of the same thought is found in the words of Nico- 
laus Copernicus: 


In the first place we must observe that the universe is spher- 
ical. This is either because that figure is the most perfect, 
as not being articulated, but whole and complete in itself; or 
because it is the most capacious and therefore best suited for 
that which is to contain and preserve all things... 


It is now impossible to tell when the thought of the maximal capacity 
of the circle and the sphere was first advanced. At any rate, Aristotle (4th 
century B.C.)—one of the greatest thinkers in human history—treats these 
facts as given. And who (other than Dido) did, in fact, solve the isoperimetric 
problem? The literature devoted to the isoperimetric property of the circle 
and the isoepiphanic property of the sphere is vast. One of the immense 
number of these works is by the German geometer W. Blaschke [2], which 
includes historical references. Should you be tempted to follow “the history 
of the isoperimetric problem “from its beginning in hoary antiquity with 
the legend of the Carthaginian princess Dido to Herr Geheimrat Hermann 
Amandus Schwartz from Berlin” [1] you could turn to Blaschke’s paper [2]. 

One of the presumed solvers of the isoperimetric and isoepiphanic prob- 
lems mentioned by the ancient authors is Archimedes. H. A. Schwartz is 
thought to have given the first rigorous proofs of the maximum property of 
the circle and the sphere. 

But in fact, Schwartz—and before him Weierstrass, and after him Blaschke 
himself, and numerous other mathematicians in the nineteenth and twenti- 
eth centuries—should be given credit (in connection with the isoperimetric 
problem) merely for shaping the ideas of their distant predecessors so as 
to meet the requirements of rigor of their time. The basic ways of solving 
the isoperimetric problem were already outlined with absolute correctness 
in ancient times. We will now describe one such way, due to Zenodorus, 
a mathematician who is thought to have lived sometime between the third 
century B.C. and the first century A.D. 

Zenodorus proves completely rigorously—by the standards of his time— 
the following assertion. 

If there exists a plane n-gon having largest area among all n-gons of given 
perimeter, then it must have equal sides and equal angles. 

In the interest of brevity, we will call a plane -gon of largest area, among 
all n-gons isoperimetric with it, a maximal n-gon. Using this term we can 
state Zenodorus’ theorem more briefly. 

A maximal n-gon (if one exists) must be regular. 

Zenodorus’ theorem follows from two lemmas. 


LEMMA 1. A maximal n-gon must have equal sides. 
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LEMMA 2. A maximal n-gon must have equal angles. 


When presenting the works of our distant mathematical predecessors, I 
will not, as a rule, reproduce them literally, preserve the notation and style 
of the authors, or strive to give the authors’ own proofs. Instead, I will 
reproduce their basic direction of thought and general spirit of argument, 
while changing and modernizing formulations and proofs. In particular, I 
will present modified proofs of Lemmas | and 2. I will use the solution of 
Heron’s problem twice. 

Before presenting the proofs, it is necessary to make an observation not 
mentioned by Zenodorus. As we are about to show, a nonconvex polygon 
cannot be maximal. Indeed, suppose that the angle A,A,A,, say, is larger 
than 180°. (See Figure 2.1) Let A. be the image of the vertex A, under 
reflection in the line A,A,. The polygon A,A,A, ... A, has greater area than 
the polygon 4,A,A,...A, and is isoperimetric with it. Now we are ready 
to give: 


PROOF OF LEMMA |. Let 4,A,...A, be a maximal n-gon. As noted, it 
is a convex figure. We suppose that not all of its sides are equal and deduce 
a contradiction. 

Let A,A, and A,A, be two adjacent unequal sides. Let / be the line 
through A, parallelto A,A,. (See Figure 2.2.) Now consider Heron’s prob- 
lem for the line / and the points A, and A,. Recall that this is the prob- 
lem of finding a point D on 7 that minimizes the sum of the distances 
|A, D| + |A,D|. As was proved in the previous section, the angles a and 8 
at D must be equal. But a is equal to the angle DA,A,, and £ is equal 
to the angle DA,A, (by the property of opposite alternate angles between 
parallels). This means that A,DA, is an isosceles triangle, and therefore D 
is different from A,. Furthermore, 

(a) the area of A A,DA, is equal to the area of A A,A,A,, since they 
have equal altitudes and bases; and 

(b) the sum of the sides A,D and DA, is less than the sum of the sides 
A,A, and A,A,, since D (# A,) is the solution of Heron’s problem. 

We now construct the isosceles triangle A,A,A, such that |A,4)|+|454,| 
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= |A,A,|+|A,A,|. Its area is, of course, larger than the area of A A,A,A,, 
since the altitude A,C is larger than the altitude DC (by virtue of the 
fact that |A,A}| is longer than |4,D|). But this means that the area of 
the polygon A,A, asa is greater than the area of the polygon A,A,:-:A, 
isoperimetric with it, a conclusion that contradicts the maximality of the 
latter polygon. This completes the proof of Lemma 1. 


CoroLtary. Lemma | implies that a maximal triangle is equilateral and 
a maximal quadrilateral is a rhombus. 


This corollary and Figure 2.3 justify the conclusion that a maximal quadri- 
lateral is, in fact, a square. 


PROOF OF LEMMA 2. Again, let A,A,-- A, be a maximal polygon. We 
know by now that all its sides are equal (Lemma |) and bear in mind that it 
must be convex. We will suppose that not all of its angles are equal and will 
deduce a contradiction. If the angles are not all equal, then there must be 
two unequal adjacent angles, a and §, say. We will show that this implies 
the existence of two unequal nonadjacent angles. 

Consider the successive angles a, 8, »,0,¢€,... (there are no fewer than 
five) of the polygon. If y 4a or 6 # B, then the proof is complete, since a 
and y (or $ and 6) are nonadjacent. If a= y, 8 = 6, and a # B, then 
our sequence of angles is o, 8, a, 8, €... , and the proof is complete, since 
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the first and fourth angles are nonadjacent. 

We see that our assumption justifies the conclusion that there are two 
triangles DEF and PQR with disjoint interiors (Figure 2.4), each of which 
is formed by successive vertices of our n-gon and such that angle E is smaller 
than angle Q. Since |DE| = |EF| = |PQ| = |QR|, the inequality of the 
angles E and F implies that |DF| < |PR|. From E and Q we drop 
perpendiculars EG to DF and QT to PR. Next, we extend the segment 
EG and apply to the extension the triangle ET’ P’ congruent to the triangle 
OTP (T goes over into 7’, P into P’ and Q into E). Now we consider 
Heron’s problem for the line 7’G and the points P’ and F. Let S be the 
solution of Heron’s problem, that is S is a point on 7’G such that the sum 
of the distances from P’ to S and from S to F is minimal. Since the angle 
P'ET' (equal to half the angle Q) is larger than the angle FEG (equal to 
half the angle E), the point S does not coincide with the point E (the 
angles P'‘ST’ and FSG are equal) and, furthermore, S lies on the segment 
EG. Now we lay off on the line QT the segment TU of the same length as 
the segment 7’S and consider the triangles DSF and PUR. The sum of 
the lateral sides of these triangles is smaller than the sum of the lateral sides 
of the original triangles DEF and PQR. In fact, 


|DS| + |SF| + |PU| + |UR| = 2(|SF| + |SP'|) < 2(\FE| + |EP'|) 
= |DE| + |EF| + |PQ| + |QR|. 


We have used the fact that our triangles are isosceles and that S is the 
solution of Heron’s problem. On the other hand, the area of A P’ES is larger 
than the area of A ESF, since their respective altitudes are |P’T’| = 4|PR| 
and |FG| = $|DF| and we have shown that |DF| < |PR|. It follows that 
the sum of the areas of the triangles DSF and PUR is greater than the sum 
of the areas of the original triangles DEF and PQR. In fact, denoting the 
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area of a triangle UVW by S, yyy, we have 


(2) Sa pse + Sa pur 


= Sy, per — 20 esr + Sa port Sa pes > Sa ver + Sa por: 

This means that the polygon DSF ...PUR... has a smaller perimeter and 
a larger area than our original polygon DEF ...PQR... . Now we can treat 
either triangle (DSF or PUR) as we treated 4 A,DA, in proving Lemma 
1, that is, we can raise it to obtain a polygon isoperimetric with the polygon 
DEF ...PQR.... Since the area of the new polygon is greater than the area 
of the polygon DSF ...PUR..., it is certainly greater than the area of the 
polygon DEF ...PQR.... This contradicts the maximality of the polygon 
DEF ...PQR... and completes the proof of Lemma 2 and, thereby, also of 
the theorem of Zenodorus. 

It remains to deduce from this theorem a proof of the classical isoperi- 
metric theorem. 


Lemma on the existence of a maximal n-gon. We have shown that if a 
maximal 7-gon exists then it must be regular. But does a maximal 7-gon 
exist? If it doesn’t, the solution of Dido’s problem turns to dust and ashes. 
After all, not all functions attain a maximum. For example, the function 
f(x) =-(1+ x doesn’t (this example is analyzed in greater detail in the 
eleventh story). 

The ancient authors did not concern themselves with questions of existence 
of solutions. It was only some 100 years ago that mathematicians began to 
appreciate the significance of existence questions and to develop methods of 
proof of existence theorems. Later we will have many occasions for dealing 
with these questions. Here we will state without proof the following assertion 
(whose truth seems to have been obvious to Zenodorus). 


LEMMA 3. There exists a maximal n-gon. 
This and Lemmas | and 2 imply: 
THEOREM 1. A maximal n-gon is regular. 


Now there is little left to prove. 

COMPLETION OF THE PROOF. Let P denote the perimeter of a regular 7- 
gon and S its area. We know from geometry that P = 2nRsin(z/n) , where 
R is the radius of the circumscribed circle, and that S = rP/2, where r is 
the radius of the inscribed circle. We have r = Rcos(z/n). All these yield 
the following formula linking S and P: 


P’-4n tan(z/n)S =0. 


Theorem 1 implies that if P is the perimeter of an arbitrary n-gon and 
S is its area, then 


(3) P’ -4ntan(n/n)S >0. 
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The inequality tana > 0 (valid for 0 < a < 2/2) and (3) imply the 
inequality 


(4) P?—4nS >0, 


which holds for an arbitrary n-gon and all 1. We note that for an arbitrary 
circle we have the obvious equality 


(5) P?-4nS =0, 


where P is the circumference of the circle and S is its area. 

Now we will state a lemma linking together all the concepts involved in 
the formulation of the classical isoperimetric problem and the notion of an 
n-gon. Its meaning is that it is possible to approximate the length of a curve 
and the area it enclosed by means of the length and area of an n-gon, and to 
do so with arbitrary precision. 


Lemma 4. For every closed plane curve of length P* that enclosed an area 
S* and for every ¢ > 0, there is an n-gon of perimeter P and area S such 
that 


(6) |P—-P*|<e,|S-S"|<e. 


Lemma 4 and the relation (4) imply that for every ¢ there is a polygon 
with perimeter P and area S such that 


4nS* < 4nS + 4ne < P’ + 4ne < (P* +6)? + 4ne = PY? + 6(2P° +40 +08). 
Since ¢€ is arbitrary, we arrive at the final inequality 
(7) 4nS* < P*. 


According to (5), this inequality becomes an equality for a circle. 
We sum all this up in the following theorem. 


THEOREM 2. The area enclosed by an arbitrary closed curve of given length 
does not exceed the area enclosed by a circle of the same length. 


This completes the solution of the isoperimetric problem. 

CoMMENTs. 1. We obtained a complete solution of our problem by com- 
bining the two geometric lemmas of Zenodorus and the two modern, essen- 
tially technical, Lemmas 3 and 4. All information necessary for a proof of 
Lemma 3 is to be found in the works of Weierstrass. The notions of the 
length of a curve and of the area enclosed by a curve were made precise by 
Jordan, who thereby provided the basics for a proof of Lemma 4. 

2. Detailed proofs of Lemmas 3 and 4 can be found in Blaschke [2]. 

Before ending this story we will digress one last time. 


Steiner’s proof. Having presented a proof based on the ideas of the an- 
cients, it is difficult to resist presenting an outline of yet another proof, whose 
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key thought is due to Jakob Steiner, a mathematician who enriched geometry 
with many remarkable ideas. A tacit assumption of Steiner’s proof is the ex- 
istence of the curve that solves the isoperimetric problem. (We already know 
that this is a justified assumption.) It remains to show that this extremal 
curve is a circle. 


ASSERTION |. The extremal curve is convex. 


What is a convex curve? It is a curve whose interior (that is, the region 
bounded by the curve) includes the segment joining any two of its points. 

In this connection, we wish to note that convexity plays a key role in 
maximum and minimum problems. We will have things to say about it in 
the sequel. Many remarkable books dealing with convexity are intended for 
high school students. One such book is by Lyusternik [7R] and another by 
Yaglom and Boltyanskii [13]. 

Let’s turn to the proof of Steiner’s theorem and prove Assertion |. 

If the curve is not convex then it must contain two points A and A’ such 
that both arcs ABA’ and AB‘A’ joining A and 4’ lie on the same side 
of the line AA’. (See Figure 2.5.) By replacing one of these arcs with its 
image under reflection in AA’, we obtain a new curve of the same length 
that encloses a larger area. 


ASSERTION 2. Jf points A and B halve the length of the extremal curve, 
then the chord [AB] halves the area it encloses. 


In fact, if the chord [AB] divided the area into unequal parts, then the fig- 
ure consisting of the larger part and its image under reflection in the diameter 
AB would add up to a figure with the same length and a larger area. 


ASSERTION 3. Suppose that points A and B halve the extremal curve. If 
C is any point on the curve, then the angle ACD is a right angle. 


This is the heart of the matter. The method we will employ to prove this 
assertion is known as Steiner’s hinged-quadrilateral method. 

Suppose there is a point C such that the angle AC B is not a right angle. 
The area bounded by the arc ACB and the diameter AB splits into three 
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parts, namely the triangle ABC and the segments adjacent to the sides AC 
and CB. Now imagine that there is a hinge at C linking together the two 
segments. “Spread” the segments so that the angle ACB’ is a right angle. 
(See Figure 2.6 on page 17.) The area bounded by the arc AC B’ will have 
increased, because, of all triangles with given lateral sides the right triangle 
has maximal area (S, aac = $|AC| |BC|sinC < 5|AC| |BC| and equality 
is attained if the angle is 90°). The figure obtained by reflecting the curve 
ACB" in the chord AB’ has the same perimeter but a larger area than the 
original figure. This proves our assertion. 

We see that the extremal figure consists of all points C from which a chord 
that halves the length of the extremal curve is seen at a right angle—that is, 
the curve in question is a circle. 

An enthusiast will exclaim: “Astounding!” A skeptic will nag: “This hasn’t 
been shown, that must be justified.... Try to prove existence.... How do we 
know that when we spread the hinge far enough, parts of the segments at C 
won't overlap?” We'll ignore his grumbling. Granted, the proof is amazing, 
but it must be justified! 

Of the many books dealing with the isoperimetric problem, I recommend 
Courant and Robbins [3], KryZanovskii [6R], and Rademacher and Toeplitz 
[12] for further reading. 


The Third Story 


3 


Maxima and Minima in Nature (Optics) 


According to Leibniz our world is the best possible. 
That is why its laws can be described by extremal 
principles. 


C. L. Siegel 


Carl Siegel, an eminent twentieth-century mathematician, obtained fun- 
damental results in many areas of mathematics and mechanics. His remark 
that is the epigraph for this story is a joke, of course, but it contains a kernel 
of truth. When discussing Heron’s problem, we had cause to remark that 
nature “employs” extremal principles. For example, we said that a reflection 
from a flat surface “chooses” a trajectory of least length. 

Heron’s words quoted in the first story contain the germ of a fundamental 
idea established between the seventeenth and nineteenth centuries. During 
this time it became clear that nature “operates” optimally in optics, in me- 
chanics, in thermodynamics—in fact, everywhere. 

The extremal principle associated with natural phenomena was clearly for- 
mulated for the first time in optics in an attempt to comprehend the law of 
refraction of light. The book of Tarasov and Tarasova [9R] deals with various 
optical problems and, in particular, with the history of the law of refraction. 

The refraction of light is readily apparent in nature. For example, a pole 
lowered into a calm, transparent lake looks bent as a result of this phe- 
nomenon. 

Ancient philosophers tried to discover the law of refraction. In particular, 
in the second century B. C. Ptolemy tried to obtain this law experimentally 
but failed to do so. 
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The law was first found by the Dutch scientist Snel. Snel’s name is not as 
well known today as the names of his great contemporaries Descartes, Huy- 
gens, and Fermat. Snel’s fame only results from his experimental discovery 
of the law of refraction of light, a discovery that remained unpublished in 
his lifetime. In his time Snel was very famous. Kepler regarded him as “the 
glory of the geometers [mathematicians] of our age.” 

Snel’s law of refraction can be stated as follows. 

Let A,OB, and A,OB, be two rays (going “from above to below”) that 
refract at the point O. (See Figure 3.1.) The angles a, and a, formed by 
the vertical OC and the respective lines A,O and A,O are called incidence 
angles (a term with which you should already be familiar). The angles £, 
and £, formed by the vertical OD and the respective lines B,O and B,O 
are called refraction angles. Snel showed that 

sina, _ sina, 

sing,  sin£,’ 
that is, the ratio of the sine of the incidence angle to the sine of the refraction 
angle is a constant that is independent of the incidence angle. 

Descartes, one of the greatest French thinkers and scholars, arrived at the 
same law independently of Snel. In the last story in Part One, we will have 
reason to ponder the question of “whether geniuses err.” Well, Descartes was 
one of the “erring” geniuses. Out of his “errors,” scattered over the fields of 
science, have grown many life-giving shoots. 

Descartes deduced the law of refraction from his conceptions of the prop- 
agation of light rays. These conceptions have not withstood the test of time, 
although they led later to the law of conservation of momentum. 

Descartes’ theory implied that the speed of light is greater in a denser 
medium, such as water, than in a less dense medium such as air. Many 
other scientists doubted this. Fermat explained the law of refraction from 
the opposite assumption, that light moves more slowly in a denser medium. 

Fermat and Descartes were both Frenchmen, as well as contemporaries. 
They often engaged in arguments in the search for scientific truth. This was 
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one such argument. In this case Fermat turned out to be correct. Experiments 
showed that the denser the medium, the slower the speed of light. 

To explain the law of refraction of light, Fermat advanced an extremal 
principle for optical phenomena. It was later named for him. This principle 
states that, in an inhomogeneous medium, light travels from one point to 
another along the path requiring the shortest time. 

Fermat’s principle allows the precise formulation and solution of a min- 
imum problem that leads to the derivation of Snel’s law. Specifically, this 
principle requires the computation of the minimum of the following function 
of one variable (see Figure 3.2): 


Ve+x b?+(d—x)* 
(1) (je ee 


It is worth noting that, at the time that he advanced his extremal principle 
(approximately 1660), Fermat already had at his disposal an algorithm for 
finding maxima and minima of functions that was equivalent to setting the 
derivative equal to zero. The use of derivatives so simplifies the derivation 
of Snel’s law that it now can be carried out by high school students. Fermat 
himself obtained the required result in a far more elaborate way. It is natural 
to ask: Why did Fermat not use his algorithm? The answer is very simple: 
Fermat could apply his method to polynomials—and here he actually antic- 
ipated the notion of a derivative—but he did not know how to apply it to 
radical expressions. That is why the deduction of Snel’s law using derivatives 
was first accomplished by Leibniz, who introduced this concept in the very 
same work of 1684 in which he laid the foundations of the grandiose edifice 
of mathematical analysis. 

Thus, Fermat deduced Snel’s law from his extremal principle, but his 
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solution was very complicated. A far simpler solution, also based on Fer- 
mat’s principle, was given by Huygens, yet another scientific genius of the 
seventeenth century and the author of the wave theory of light. 

Before reproducing Huygens’ solution let’s state the problem precisely. 

Given two points A and B on either side of a horizontal line | separating 
two media. It is required to find a point D such that the time it takes for a 
light ray to traverse the path ADB is a minimum, provided that the velocity 
of propagation of light is v, in the upper medium and v, in the lower one 
(Figure 3.2). Note that (1) is a mathematical reformulation of this problem 
and that this problem is very similar to Heron’s problem. 


Huygens’ solution. Let D (see Figure 3.2) be a point at which 
sina, U, 
sina, U,_ 


(2) 


We will show that for any other point D’ 4 D the time of traversal of the 
path AD’B is greater than the time of traversal of the path ADB. To this 
end we erect perpendiculars to the line AD at A and D, respectively. Let 
P be the point of intersection of AD’ and the perpendicular at D. We 
draw a line through D’ parallel to AD and denote its points of intersection 
with the perpendiculars (to AD) at D and A by P’ and R, respectively. 
Finally, we drop the perpendicular to D'Q from D’ to DB. From Figure 
3.2 we see that the angles PDD’ and D'DQ are respectively equal to a, 
and 2/2 —a,, respectively. Hence 


(3) |D'P’| = |D'D|sina, , |DQ| = |DD'|sina,. 


Now we compare the traversal times along the paths ADB and AD’B. 

The relation (3) and the inequalities |AP| > |AD|, |D'P| > |D'P’|, and 
|D' B| > |BQ| (inclined segments are longer than perpendicular ones) imply 
that 


|AD'| |AD| + |P'D'| _ |AD| +1D pees, 
v; v; Uv, Uv 
|D'B| . |BQ| _ |DB|~|DQ| _ |DB| _ |p pysinay_ 
7) Uy Uy U2 U2 


The latter inequalities and (2) show that 
|AD'| |D'B| _ |AD| |DB| 
— +— > — + — . 
v U2 Uj U2 
Thus the refraction point that minimizes the time of traversal of the broken 
path from A to B is characterized by the fact that the ratio of the sines of 


the angles of incidence and refraction is equal to v, /v, , that is, to a constant. 
But this is just Snel’s law. 
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What underlies Fermat’s principle is the assumption that light is propa- 
gated along certain lines. This idea ties in most readily with the corpuscular 
theory of light that regards light as a flow of particles. We owe to Huygens 
another explanation of the propagation and refraction of light, based on the 
notion of light as a wave whose front moves in time. 

A wavefront S, is the set of points that can be reached by light from some 
source in time ¢. For example, if at time zero the source is a point and the 
medium is homogeneous, then at time ¢ the front S, will be a sphere of 
radius uf centered at the light source. With increasing distance from the 
source the spherical wave becomes ever more planelike. Thus if we think of 
the source as infinitely distant, then the wavefront will be a plane moving 
uniformly with velocity v. 

To determine the motion of a wavefront in more complex cases Huygens 
used the following rule, now known as “Huygens’ principle”: every point of a 
wavefront S, itself becomes a secondary source, and in time At we obtain a 
family of wavefronts from all these secondary sources, and the actual wavefront 
Siza, at time t+At is the envelope of this family—that is, the surface tangent 
to all secondary wavefronts. (See Figure 3.3.) 

Let’s use Huygens’ principle to deduce Snel’s law. 

Consider a parallel pencil of light rays falling on a plane boundary separat- 
ing two homogeneous media. As before, we will suppose that / is horizontal 
and that the light falls from above. (See Figure 3.4.) We will denote the 
velocities of propagation of light above and below / by v, and v, and the 
angles of incidence and refraction by a, and a,. The wavefront 4,4’A is 
moving with velocity v, and at a certain moment ¢ reaches the boundary 
1 at the point D. Then D becomes a secondary wave source that prop- 
agates in the lower medium with velocity v,. Light reaches the point D, 
at time ¢, = ¢+|8,D,|/v, = ¢ + (|DD,|sina,)/v, , and an intermediate 
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point D’ on the segment DD, at the moment ¢’ = ¢ + (|DD'|sina,)/v,. 
By time ¢,, the spherical wave due to the secondary source D will have 
radius r, = v,(t, — t) = |DD,|(v,/v,)sina, and the wave due to D' will 
have radius r’ = v,(¢, — ¢') = |DD'|(v,/v,) sina, . Since the angles DD,C 
and D'D,C are equal (their respective sines are r,/|DD,| and r'/|D'D,| and 
both numbers are equal to (v,/v,) sina,), the tangents D,C and D, C’ to 
these spheres coincide. But D’ is an arbitrary point on DD, . This means 
that all secondary waves are tangent to the line CD, at ¢,. The latter line 
forms with / an angle a, such that sina, = (v,/v,)sina,. Thus we have 
again obtained Snel’s law. 

The idea of a wavefront can also be illustrated with examples that are not 
derived from optical problems. Consider a traveller who begins to walk at 
a point A of a rectilinear highway that bounds a meadow. The traveller 
tries to reach a point B in the meadow as quickly as possible. His speed 
uv in the meadow is half his speed on the highway. If the traveller walks all 
the time in the meadow, then in a unit of time he can reach any point of 
a circle of radius uv. If he walks all the time on the highway then he will 
cover the distance 2u. Suppose he walks partly on the highway and partly 
in the meadow. Then the set of points that he can reach in a unit of time is 
a “wavefront” consisting of two segments connected by a circular arc.’ Now 
let’s touch once more on the subject of extremal principles. 


'The following justification of this claim is due to Prof. E. Barbeau. 

To fix ideas, suppose the traveller is at A(0, 0) , that he can walk 1 unit per second along 
the x-axis and 1/2 unit per second along any other path in the plane. 

The most efficient paths to consider are those beginning along the x-axis and then going 
straight to B. 


B 


A(0.0) (1, 0) 


How far can the traveller go in 1 second? Suppose he leaves the x-axis at the point (¢, 0), 
0 <¢< 1 (we consider only the positive quadrant). Then he winds up on the circle 


_p\2 
C.ie-n +y'= (+) ‘ 


(t, 0) (1.0) 


Let us fix a particular x € [0, 1] and see which circle will maximize y. More specifically, 
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In this story I have given two derivations of the law of refraction of light. 
There is a fundamental difference between them. Fermat’s approach sheds 
no light whatsoever on the true essence of the occurring phenomenon. In this 
approach, one postulates a certain property of the trajectories and shows that 
it is borne out by experiment. In Huygens’ approach, the point of departure 
is the description of the physical nature of the phenomenon. 

This descriptive duality is typical of the natural sciences. While the laws 
of nature admit of interpretations based on physical models, they are also 
derivable from extremal principles. 

The two approaches described in this story have played a very important 
role in the history of the calculus of variations and of the whole theory of 
extremal problems. In fact, every problem in the calculus of variations and 
in optimal control can be investigated in two ways. One way is to investi- 
gate its extremal trajectories (in the manner of Fermat). This leads to the 


we want to maximize 


ry: 
¢,(0) = (=) -(x-1'= al — 4x7) +.2(4x — 1) — 30°] 
_ 1 [4a-x) 4x-1  \? 
-i| 3 Bl 3 ‘) | 
3 /4x-1 2 
E (l- x) 7-2 ( 3 -1)]. 
¢, 


(¢) has its maximum at ¢ = 0. so 


If O<x<}, then 


y < ; +x’, or x? +y? < 7 
and (x,y) € circle with center (0, 0) and radius 4 
If 1 <x <1, then ¢,(¢) has its maximum at ¢ = (4x — 1)/3. so 


and (x,y) is under the line y = (1 — x)/V3. 


(0, 1/2) _ 
(1/4, V3/4) 


(1.0) 
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Euler-Lagrange theory (which we will touch upon in the fourteenth story). 
The other way (the one due to Huygens) is to investigate bundles of extremal 
trajectories. This procedure leads to analogs of wavefronts, to the theory 
developed by Hamilton and Jacobi in the nineteenth century, and to the 
investigation of problems of optimal control by means of the methods of 
dynamic programming, first developed (relatively recently) by the American 
scientist Richard Bellman. 


The Fourth Story 


A 


Maxima and Minima in Geometry 


The history of science contains many examples of 
applications of pure geometry and of its usefulness. 


P. Laplace 


Archimedes will be remembered when Aeschylus will 
have been forgotten, for languages die while mathe- 
matical ideas do not. 


G. H. Hardy 


Inexhaustible supplies of precious problems on maxima and minima are 
hidden in the depths of the oldest mathematical discipline—geometry. 

Geometric problems on maxima and minima are found in the works of 
each of the three greatest mathematicians of antiquity—Euclid, Archimedes, 
and Apollonius. They were also paid tribute by the most prominent mathe- 
maticians of the Renaissance—Viviani, Torricelli, Fermat, and others. Even 
today, interest in such problems remains high. 


1. Euclid’s problem. In Euclid’s Elements, the first scientific monograph 
and textbook in the history of mankind, which was written in the fourth cen- 
tury B.C., there is just one maximum problem. The following is its modern 
formulation: 

Ina given triangle ABC inscribe a parallelogram ADEF (EF\||AB, DE||AC) 
of maximal area. (See Figure 4.1 on page 28.) 

I will give one of the possible geometric solutions of this problem; it goes 
back to Euclid’s solution in the Elements. Specifically, I will prove that what 
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characterizes the required parallelogram is that D, E, and F are the mid- 
points of the appropriate sides. 

Let AD'E'F' be a parallelogram inscribed in ABC that is different from 
ADEF. Let G’ denote the point of intersection of the lines D'E’ and EF 
and G the point of intersection of the lines DE and E’F'’. 

We wish to show that the area of the parallelogram AD'E’F' is less 
than the area of the parallelogram ADEF by the area of the parallelogram 
EG’'E'G. To this end, we drop the altitude from the point B in the triangle 
ABC and denote its length by H. We denote the length of the side AC by 
b and the length of the altitude in the triangle GE’E from the point E’ by 
H,. 
In view of the similarity of the triangles GE’E and ABC (E'G||AB and 
GE||AC) , we have 

Mi oe oi IEE) 
IGE| b” H/2~ 6/2° 


From this relation it follows that the area of the parallelogram D'G'ED, 
whose altitude is H, and the length of whose side DE is b/2, is equal to 
the area of the parallelogram EGF'F , whose altitude is H//2 and the length 
of whose side F’F is |GE|. It follows that the area of the parallelogram 
ADEF is equal to the area of the figure AD'G’EGF’ that is greater than the 
area of AD'E’F' by the area of the parallelogram GE’G'E. This completes 
the solution of the problem. 


2. The problem of Archimedes. We have already mentioned that some an- 
cient authors attribute to Archimedes (287-212 B.C.) the proof of the isoperi- 
metric property of the circle and the isoephiphanic property of the sphere. 
But in the surviving works of Archimedes there is no reference to the isoperi- 
metric problem, and his contribution to its solution is thus far unknown. On 
the other hand, in his work On the sphere and cylinder, Archimedes poses 
and solves the following problem: 

Among all spherical segments with the same spherical area, find the one 
that encloses the largest volume. 
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FIGURE 4.2 


We will first give a solution which, while entirely based on Archimedes’ 
ideas, nevertheless depends strongly on algebra. We will then give the same 
solution in the purely geometric language used by its author. 

Consider a sphere of radius R and its spherical segment BAB’ of height 
h. (See Figure 4.2.) Together with the segment BAB’ we consider the 
hemisphere EDE’ with the same lateral surface. We denote its radius by r. 
We know that the volume V of the spherical segment is nh?(R- (h/3)), its 
lateral surface area is 22Rh, the volume V of the hemisphere is (2/3)ar° ; 
and its lateral surface area is 2r?. From the equality of the lateral surface 
area of the segment and the hemisphere, we have 


(1) r= Rh. 
We will prove the inequality 
(2) (2R—-r)r>(2R—-A)h forhFZR. 


We will consider two cases: (a) A < R and (b) h > R. In case (a), 
P=Rh>h sr>h>R-r<R-h> (2R-r)yr=R -(R-ry > 
R’ —(R-h)’ = (2R—A)h. In case (b), 

r=Rh<h? 
r= Rh>R 


>r—-R<h—-R=(2R-r)r=R-(R-r) > R -(R—-h) = (2R-A)h. 
Using (1) and (2) and multiplying by 24/3 we obtain 


beRcrch 


(3) a -2Rr> E(3R— yh. 

Replacing Rh by r? in (3) we arrive at the required inequality 
52.3 mh 2(p_f\_ 
V= lad =S 2Rr > xh (R 4) = V. 


Thus a hemisphere whose lateral surface is equal to that of a spherical seg- 
ment encloses a greater volume than the segment. To quote Archimedes, “of 
all spherical segments bounded by equal surfaces the largest is a hemisphere.” 


30 MAXIMA AND MINIMA IN GEOMETRY 


FIGuRE 4.3 


Of all scientists, the genius of Archimedes, like that of Newton, most likely 
elicits the greatest admiration. We will re-solve our problem, but this time we 
will follow Archimedes’ thought almost literally (and include in parentheses 
the relevant algebraic relations). 

Archimedes could use neither the language of algebra—whose birth was to 
come 18 centuries later—nor algebraic computations. His language was that 
of geometry. Following Archimedes, we lay off on the line A’A (Figure 4.3) 
a segment [OH] so large that a cone of height HM and base radius MB 
has the same volume as the spherical segment BAB’. On the segment [O4’] 
produced we lay off the segment [A’K] of length equal to the radius R. 
From the equality of the volumes of the cone and the segment Archimedes 
obtains the proportion 
(4) |H M| Ss |[KM| 

|AM|~ |A’M|° 
We use the familiar formulas for the volume V, of acone and V. of a 
segment to check this equality: 
n 


Vy, = Z|HM||Mal = 5|HM||M4'||MAl 


(5) 
= Vo = F(3R— Ah’ = F1KM| AM). 


This check uses the fact that the length of the segment [7B] is the geometric 
mean of the lengths of the segments [4’M] and [MA]. The equality (4) 
follows directly from (5). 

The equality of the surface areas of the hemisphere and the segment im- 
plies that 


(6) |AB| = |EDI. 


Indeed, |ED| = rV2,|AB|? = |AA‘||AM| (the familiar property of a 
triangle inscribed in a circle and based on a diameter), so that n|AB|" = 
2nRh = S, =S = 2ar =n\ED|’ = |AB| =|EDI. 

Now Archimedes lays off the segment [AS] equal in length to [CD] 
and proves the inequality (2): |A’S||AS| > |A’M||AM|(# (2R - r)r > 
(2R —h)h). Archimedes justifies this geometrically: of two rectangles with 
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the same perimeter, the one with the larger area is the one with the greater 
small side. 

In view of the equality of the lateral surface areas of the segment and the 
hemisphere, we have 


|AS! =|AM||4'K| (# 7 = Rh). 
This equality and the preceding inequality yield 
|AS||AA'| > |KM||AM| (# 2Rr > (3R—A)A). 
Multiplying by |AM| and using (5), we obtain 
(7) |AS||AA'||AM| > |KM||AM/’ (@ 2Rrh > (3R- A)h’). 


We showed earlier that 
IK M||AM|’ = |HM||MBY’ [see (5)], 
|AA'||AM| = |AB|’ = |ED|’ [see (6)]. 
By construction, |AS| = |CD|. These equalities and (7) yield 
P= zICDI |ED|’ > =IHM| IMB! =V, =V, 


(oP = ar > 5 (3R- h)h? = V,). 


This completes the proof. 

It may be appropriate to recall that all the formulas we have used (the 
volume of a cone, a sphere and a spherical segment, the surface area of the 
sphere and a segment) were first obtained by Archimedes in his work On 
the sphere and cylinder. It is difficult not to agree with Hardy (see epigraph) 
that Archimedes will be famous as long as mathematics survives. (But I am 
reluctant to agree with the second half of his sentiment. Aeschylus, too, will 
remain famous!) 

We will postpone discussion of the problem posed and solved by Apollo- 
nius until the thirteenth story. 


3. Steiner’s problem. Jn the plane of a triangle, find a point such that the 
sum ofits distances from the vertices of the triangle is minimal. 

This problem was discussed in another formulation in the first story. It, 
too, has a long history, although not as long as that of Heron’s problem or 
the classical isoperimetric problem. It was included in Viviani’s On maximal 
and minimal values (1659), the first work devoted to our subject. 

Cavalieri and Torricelli were also interested in this problem. (The solution 
of this problem, that is, the point where the required minimum is attained, 
is called the Torricelli point; see, for example, Zetel’s book [SR].) Coxeter 
claims that Fermat also studied this problem. 
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(Viviani, Cavalieri, and Torricelli were the greatest Italian mathematicians 
of the seventeenth century. Cavalieri’s principle served as a precursor of 
the integral calculus; Torricelli is known for the discovery of atmospheric 
pressure. Torricelli and Viviani were students of Galileo. In fact, it was to 
Viviani that the blind Galileo dictated his Conversations on mechanics near 
the end of his life.) 

The interest of so many eminent scientists in so elementary a problem 
provides yet another confirmation that aesthetic motives often provide the 
stimulus for creativity. 

In the nineteenth century Steiner devoted much attention to this and to 
a series of similar problems. They are frequently referred to as Steiner’s 
problems. We will also use this name. 

We will now give the well-known geometric solution of Steiner’s problem 
for triangles whose angles do not exceed 120°. 

Suppose that the angle C in the triangle ABC (Figure 4.4) is > 60°. 
We rotate the triangle ABC about C through 60° and obtain the triangle 
A'B'C. Let D be any point in triangle ABC and D’ be its image under 
our rotation. Then the sum of lengths |AD| + |BD| + |CD| is equal to the 
length of the polygonal line |BD| + [DD'|+|D‘A'|. 

Now let D be the Torricelli point, that is, the point f rom which all the 
sides of the triangle are seen at an angle of 120°, and let D’ be the 2 image 
of D under our rotation. It is easy to see that the points B, D,D' and 
A’ are collinear. This means that the Torricelli point is the solution of our 
problem. We leave it to the reader to show that, if the obtuse angle is greater 
than 120°, then its vertex is the solution of the problem. 

The fourth and fifth problems in the first story closely resemble Steiner’s 
problem. We leave it to the reader to think through the fourth problem. The 
answer to this problem can be stated as follows: If the points A, B,C, 
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and D form a convex quadrilateral, then the required point is the point of 
intersection of its diagonals; otherwise it is the vertex of the largest angle. 
For problem 5, the answer is “yes.” For example, in the case of a square 
there are two extremal nets, represented in Figure 4.5. The sum of the lengths 
of these nets is less than the sum of the diagonals. 
Let’s now state two familiar geometric minimum problems. 


4. Least area problem. Given an angle and a point in its interior. To pass a 
line through the given point that cuts off from the angle a triangle of minimal 
area. 

We will show that the required line is such that its segment in the interior 
of the angle is halved by the given point. Such a line is easy to construct. 
One way is to join the given point M (Figure 4.6) to the vertex A, to lay 
off on the segment [AM] produced a segment [/A’] of length equal to the 
length of [4M], and to pass through the point 4’ a line parallel to AC. 
Let D be the point of intersection of this line and the side AB. It is easy 
to see that the line joining D to M and intersecting AC ata point EF has 
the required property |DM| = |ME| (the triangles MDA’ and MEA are 
congruent). There are also other constructions of this line. 

It remains to show that the line just constructed yields the required min- 
imum. To this end, we pass through M some line D'E’. We assume for 
definiteness that the point E’ is to the left of E. Then the area of the tri- 
angle AED’ is equal to the area of the triangle AED minus the area of the 
triangle EME’ plus the area of the triangle MDD’. Let F be the point 
of intersection of the lines DA’ and D'E’. Then the triangles EME’ and 
MDF are congruent. Since the latter triangle is contained in the triangle 
DD'M,, it follows that the area of the triangle ADE is smaller than the area 
of the triangle AD’E’ . 


5. Least-perimeter problem. Given an angle and a point in its interior, pass 
a line through the given point that cuts off from the angle a triangle of minimal 
perimeter. 

We will show that the required line DE has the property that the excircle 
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of the triangle ADE is tangent to the segment [DE] at the point M. It 
is easy to construct such a line. To this end we inscribe in the angle BAC 
(Figure 4.7(a)) a circle and denote by M’ the point of intersection of the 
line AM and the circle that is closest to A. Next we draw a tangent to our 
circle at the point M’ and then a line through M parallel to that tangent. 
This is the required line. 

It remains to show that the line just constructed yields the required mini- 
mum. To this end we pass through M some line D'E’. (See Figure 4.7(b).) 
Consider the excircle of the triangle AD'E’ touching the segment [D’E’] at 
the point F and the sides AB and AC of the angle at the points D” and 
E" , respectively. The lengths of the segments [EF] and [E’E”] are equal, 
since they are the lengths of tangent segments drawn from the same point. 
The same is true of the lengths of the segments [D’F] and [D'D”]. Hence 
the perimeter of the triangle AD’E’ is equal to the sum of the lengths of the 
segments [AD"] and [AE"]; of course, |AD"| = |AE"|. This means that 
for the perimeter of triangle AD'E’ to be minimal the points E” and D" 
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must be as closeto A as possible. This will take place just when the excircle 
“leans” on the point /, that is, when the segment of the line through the 
point M touches that circle at M. 

We have not yet paid the “geometric debt” of discussing problems | and 
2 from the first story. Their solutions are clear from Figures 4.8 and 4.9, 
respectively. If the constructions shown in these figures are impossible, then, 
in each case, the required point coincides with the vertex of the relevant 
angle. 

Our topic is inexhaustible. The number of books devoted to it is tremen- 
dous. We will mention a few of them. They contain problems to suit any 
taste. After studying Part Two the reader can try to find analytic solutions to 
problems. A few books containing geometric extremal problems are Courant 
and Robbins [3], Coxeter [4], Niven [11], Zetel'[5R], Sarygin [10R], Sklarskii, 
Centsov, and Yaglom [12R]. The book of Boltyanskii and Yaglom [13] pays 
special attention to such problems as well. 

We will analyze additional geometric problems in the thirteenth story. 


The Fifth Story 


S 


Maxima and Minima 
in Algebra and in Analysis 


Algebra is generous. She often gives more than is 
asked of her. 


D’Alembert 


1. Tartaglia’s problem. We will begin our story with a discussion of the 
following problem, posed by Niccolo Tartaglia (1500-1557). 

To divide the number 8 into two parts such that the result of multiplying 
the product of those parts by their difference is maximal. 

We will attempt to reconstruct the chain of reasoning that led Tartaglia to 
the solution of his problem. Before we do so, it will be helpful to say a few 
words about the history of his remarkable discovery of the rule for solving 
cubic equations by radicals. 

The first to solve the equation rae px+q =0 (for positive p and negative 
q) was Scipione del Ferro (14657-1526). At that time the only admissible 
roots of an equation were its positive roots. Negative roots, and all the more 
so complex roots, were ignored. 

Del Ferro did not publish his discovery although he did show it to his asso- 
ciates. At that time “mathematical contests” were very popular. (In our own 
time this tradition has been revived in a somewhat different form. Groups 
rather than individual contestants enter the fray, for example, members of 
a boarding school, winners of a school olympiad and its judges, and so on.) 
One of those initiated into the secret of the solution of cubic equations de- 
cided to use it in order to prevail in such a contest. This competitor would 
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undoubtedly have succeeded had he not been fated to encounter Niccolo 
Tartaglia. Tartaglia’s task was to solve 30 cubic equations for different val- 
ues of p and q. At first he was unaware that his opponent knew the secret 
of the general solution; he discovered this shortly before the deadline for the 
presentation of the solutions of the problems. Through a tremendous effort, 
Tartaglia managed, on his own, to find the general method eight days before 
the deadline. (For details, see Gindikin’s book [6].) Like del Ferro before 
him, Tartaglia obtained the following formula: 


1 qj? ep? .j_a_i@’, Pe 
(1) ane are ig tas 
Formula (1) yields an expression for the positive root in del Ferro’s case (for 
p> O and g <0). But it also yields an expression for a real root in other 
cases (for example, as we will see, when p <0 and qg > 0). This formula is 
usually called the Cardano formula in honor of the man who first published 
it. 

Tartaglia did not publish the formula himself, but in a number of his 
works he announced that he was able to solve problems of various kinds. 
One of these was the problem given earlier in this story. Without describing 
the solution, Tartaglia stated the answer in the following form: Halve the 
number 8; the square of that half augmented by a third of that square will 
be equal to the square of the difference of the two parts. In other words, if we 
denote the required numbers by a and b (a> 5), then Tartaglia states that 
(a—b)* = (8+2)?+(8+2)?+3 = 64/3, sothat a—b = 8/V3 > a = 44(4/V3). 
We will see that Tartaglia was right. 

Following Zeuthen’s book [14], we will try to reconstruct Tartaglia’s train 
of thought that led him to the correct answer. Rather than tie ourselves 
down to the concrete number 8, we will solve the problem in general form. 
Let S denote the number to be divided. We saw that Tartaglia’s answer 
involves not the numbers a and b but rather their difference, which he 
almost certainly took as the unknown. If we set a— b = x, then a = 
(S+x)/2, b = (S —x)/2, so that we are looking for the maximum of the 
function f(x) = x(S/2 + x/2)(S/2 — x/2) = (S’x —x°)/4. Let M denote 
the maximum in question (for x > 0). Then we obtain for x the equation 


(2) «(3 +5) (5-5) = Mex Six 44M =0. 


Unfortunately, here p = —S’? <0 and q = 4M > 0, so that equation 
(2) does not have the structure of del Ferro’s equation. On the other hand, 
equation (2) has a noteworthy special feature, namely that in addition to a 
negative root (that we denote by £ ) it has a@ positive root of multiplicity two, 
that is, here the function and its derivative vanish. Figure 5.1 shows that for 
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m> M the equation x? — S?x + 4m = 0 has no positive roots, for m< M 
it has two such roots, and for m = M it has one positive root. We denote 
the positive root of equation (2) by a, which is thus the required difference. 
We can therefore write down the identity 


x -S’x+4M = (x + B)(x -— a: =x 4 (B - 2a)x? + (a? — 2a8)x + a’B 
which implies that 8 = 2a, p = —S* = a’ — 2a =a’ — 4a” = —3a’, 
q = 4M =a7 = 20°. But then g°/4+p°/27 =0 @ (4M)*/4 = S°/27 & 
(2M)? = (S?/3)°. It seems that Tartaglia believed that if in (1) q’/4 + 
p?/27 = 0, then this formula yields the expression for the negative root: 


apo =I = 25 ee pO 
p=2 sit 2V2M = -2— + 8 7 


V3 
Whence 


For S = 8, we see that in order to find the square of the difference “we 
must halve the number 8 and augment the square of that half by a third of 
that square.” Thus, the problem is solved. 

Many interesting problems on maxima and minima are concealed in var- 
ious “exact inequalities.” We will continue our story with a discussion of 
what may well be the oldest such inequality. 


2. The inequality of the arithmetic-geometric means for two numbers. Let 
a and b be nonnegative numbers. Their geometric mean is the number ab 
and their arithmetic mean the number (a+ b)/2. We will show that for any 
two nonnegative numbers a and b, we have the inequality 
(1) Vab < 22°, 
that is, the geometric mean does not exceed the arithmetic mean. The in- 
equality (1) is exact in the sense that in (1) equality is actually attained. This 
occurs if (and only if) a=b. 


40 MAXIMA AND MINIMA IN ALGEBRA AND IN ANALYSIS 


FiGuRE 5.2 


The inequality (1) conceals various extremal problems. Two such problems 
are: 


(a) Find the maximum of the product of two numbers whose sum is con- 
Stant. 

(b) Find the maximal area of a right triangle whose small sides have 
constant sum. 


One consequence of the inequality (1) is that among all right triangles with 
prescribed sum of the small sides, the isosceles triangle has maximal area, a 
fact known already to ancient geometers. 

Problem (a) is algebraic by content, problem (b) is geometric. When 
Fermat discovered his method for finding maxima and minima (to be dis- 
cussed in the eleventh story), he presented it in a private letter to Roberval, a 
well-known contemporary mathematician, using problem (b) to illustrate his 
method. 

There are many proofs of the inequality (1). We will give two proofs, one 
of which is algebraic, the other, geometric. 

The algebraic proof is based on the following chain of obvious inequalities: 


a+b 

7 

Let’s now turn to geometry (Figure 5.2). We take a segment of length a+b 
(|AD| = a, |DC| = b) and draw a semicircle with [AC] as diameter. We 
erect a perpendicular to AC at D and denote by B its point of intersection 
with the semicircle. In view of the similarity of the triangles ABD and BCD 
(recall that the angle B subtends a semicircle and is therefore a right angle, 
so that angle A is equal to angle DBC and angle C is equal toangle ABD), 
we have 


0< (a—b)° => 2ab < a’+b’ = 4ab < a’+2ab+b’ = (a+b) => Vab< 


BD 
If we now keep [AC] fixed (that is, if we are given the sum a+ 5b) and 
vary the point D, then it is clear thatthe segment BD will attain its maximal 
length (= (a+ b)/2) when D coincides with the center of the semicircle. 
This proves the inequality (1). 


lz 
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3. The inequality of the arithmetic-geometric means (the general case). We 
will prove the following theorem. 

For arbitrary nonnegative numbers X,,.-.,X, we have the inequality 

goin Xn 
me n 

The left side of (1) is called the geometric mean of the numbers x,, ... , x, 
and the right side their arithmetic mean. Thus the geometric mean does not 
exceed the arithmetic mean, not only for n = 2 but also for arbitrary n. 
The inequality (1) is exact—it becomes an equality only if all the numbers 
are equal. 

There are many proofs of the inequality (1). One of the most beautiful, as 
well as completely elementary, proofs was formulated by the famous French 
mathematician A. L. Cauchy. 

We begin by using Cauchy’s method to prove inequality (1) for n = 3. To 
this end we deduce (1) for n = 4 and “descend” to n = 3. For n = 4 the 
inequality (1) follows readily if we twice use the established version of (1) 
for n= 2: 


x 2 2 
XXX +X Xa +X 
11 Xq 4° Xy (¥, m2) yx) < (AF) ( A +) 


fx te) (%5 4% Ae X, +X) +xX,+x,\" 
a 2 2 4 , 


This proves (1) for 2 = 4. Using (2) we have 


(1) Lye. 


(2) 


1/3 
1/3 X + X_ + X34 4+ (X)X2X3) : 


1/3 1/4 
(X) ° Xq Xz) °° = [XQ XZ (K-43) DS 


X, +X +X; 


X, +X, 4%; 
4 rr, es: | 


1/3 
=> (X, XQ Xs) S 3 


3 1/3 
> Gn an)! < 


which proves (1) for n = 3. 
Next we will prove (1) in the general case. First, we note that following 
the approach used earlier to prove the inequality (1) for n = 4, it is possible 
to prove it for n = 8, then for n = 16, and so on—that is for n = or. 
he 2 33 jets 
Now we will employ the “method of descent” used earlier in going from 
n= 4 to n= 3. Suppose the inequality has been proved for n = m+ 1. 
We will now prove it for n = m. By assumption, 
1/m 


1/m, 1/(m+1) 


(x, “oX) = ((Xp °° Xq_ M(%y bs eo) ) 
Ny bee A gg Hy Ng) 


m+1 
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This completes the proof of inequality (1). 

Our proof is but one of many. The well-known book Inequalities, by 
Beckenbach and Bellman, contains twelve proofs of this inequality. Of these, 
the simplest is probably the one due to Ellers. Following Ellers, we prove by 
induction that x,---x, = 1, x; > 0, implies the inequality x,+---+x, >n 
(from this, the rest follows in an obvious manner). For 7 = 1, this assertion 
is trivial. Assume that it holds for n = m. Let x,---x,,,,; = 1. Then 
there are two numbers (say, x, and x,) such that x, > 1 and x, < 1, that 
is (x, — 1)(x, — 1) < 0 or, equivalently, x,x, +1 < x, +x,. This and the 
induction assumption imply that X,+---+X,,,) 2 1+2,%)+%3+'°°+%X,,41 2 
1 +m, which was to be shown. 

The inequality of the arithmetic-geometric means has always been a fa- 
vorite topic of mathematical clubs. For example, consider the following 
problem. 

In a given sphere, inscribe a cone of maximal volume. 

Let R denote the radius of the sphere and r and / the base radius and 
altitude of the cone, respectively. Then (think this through) the volume V of 
the cone is equal to mh?(2R —h)/3. Using the inequality of the arithmetic- 
geometric means, we obtain 

3V hh 3 

Ga DD (2R— A) s (2R/3) 
with equality attained for A/2 = 2R—h = h = (4/3)R. For this value of 
the altitude the cone will have maximal volume. 

Here are two more problems. 

In a given cone inscribe a cylinder of maximal volume. 

Given a sheet of tin a x b, cut out equal squares at its corners so that the 
open box obtained by bending the resulting edges has maximal volume. 

Solving such problems by our method is of interest as long as one is not 
acquainted with differentiation. 


4, The inequality of the arithmetic-quadratic means. Let x,,..., x, be 
some numbers. By their quadratic mean, we mean the number [(x? career 


x )/ny'! >| The f ollowing theorem is true. 
The inequality 


2 2\ 1/2 
(1) nl Nails oe Xi +:: +X, 
n a n 


holds for arbitrary numbers x,,...,%,,, that is the arithmetic mean does 
not exceed the quadratic mean. The inequality (1) is exact. It becomes an 
equality only if all the numbers are equal. 


The inequality (1) can also be proved in several different ways. The fol- 
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lowing proof is probably the simplest. We have 
(2) 0<(a—b) = 2ab<a’+b’. 
By squaring the arithmetic mean and using the inequality (2) we obtain 


2 2 2 
Xp tH VO ApH HAH Dey H WNYAy He + 2H 1M 
n n2 
2 2 2. 2 2.2 2 2 
a4 HX + (XK) + XQ) + (Xp +5) +225 + (0, + X,) 
=o Se ee 
n 

- n(x; +o +.x2) xe peep x? 
= 7 : : 
which is what we wished to show. 

Juxtaposition of the inequality of arithmetic-geometric means and the 
(established) inequality (1) shows that for arbitrary nonnegative numbers 


X15+++5%X, we have the exact inequality 


2 2\ 1/2 
x) + +X, 
/X pr XS oa aes = 


In particular, for n = 2 wehave \/X,X, < V(x; + coay 2. This inequality 
can be easily given a geometric interpretation. For example, it directly implies 
that ofall rectangles inscribed in a circle the square has largest area. In turn, 
this problem admits at least two stereometric generalizations. First, of all 
rectangular parallelepipeds inscribed in a sphere, find the one of largest volume. 
Second, of all cylinders inscribed in a sphere, find the one of largest volume. 
Both of these stereometric problems were investigated by Kepler. (We discuss 
this further in the next story.) Incidentally, an immediate consequence of the 
inequality (3) for n = 3 is that of all parallelepipeds inscribed in a sphere, 
the cube has the largest volume. (Think this through!) 

The planimetric problem of the rectangle of largest area inscribed in a cir- 
cle will also turn up later. We will refer to it as Kepler’s planimetric problem. 


5. The Cauchy-Bunyakovskii inequality. The following theorem holds: for 
arbitrary numbers a,,...,@,,5,,..., 6, we have the inequality 
(1) a,b, +++ +4,b, < (ap 4-4 a?)'?(b? test meas 

The inequality (1) is called the Cauchy-Bunyakovskii inequality. This in- 
equality is exact: equality is attained for a, = b,,...,@a, = b,. We will 
prove (1). If b, =--- = 5, = 0, then there is nothing to prove. Suppose that 
not all b; are zero. For an arbitrary x , we have 


(a, +xb,) +++: +(a,+xb,) =a) +---+.a% + 2x(a,b, +---+4,b,) 


+2°(bF +++ +B?) = ax’ +2bx +0, 
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where we set 
a= b? ise pb? b=ab b =e 2 
= 0, +-:-+9,, =a,b,+-:-+a,5,, C=a, +--+). 
It is clear that a > O and that for all x we have the inequality 
(2) ax’ +2bx+e>0. 


But the nonnegative character of the quadratic trinomial in (2) is equivalent 
to the inequality 


(3) bac <0 (a,b, +---+4,b,) < (ay +++ +a2)(b) +++ +52), 


which is what we wished to show. 
An important generalization of the Cauchy-Bunyakovskii inequality fol- 
lows. 


6. The H6lder inequality. We will show that the following inequality holds 


for nonnegative numbers a,,...,@,,6,,...,5, and for p > 1, p =p/ 
(p — 1), ((1/p) + (1/p') = 1): 
(1) a,b, + --+a,b, < (a + casge N(R? pear yy”, 


The inequality (1) is called the Hdlder inequality. To prove it, consider the 


functions y = x’"' and x = y? | These functions are mutually inverse 
(check this). Choose two positive numbers a and b. Then 


a P rb, bP 
i lax =<, | y'dy=—. 
0 P Jo D 


Now look at Figure 5.3. The quantity a’/p is the area of the vertically 


cross-hatched curvilinear triangle, and 5? /p' is the area of the horizontally 
cross-hatched curvilinear triangle. It is easy to verify that, regardless of the 
disposition of a and b, the sum of the areas of these two triangles is not less 
than the area of the rectangle with sides a and 5. Also, equality is possible 
only if a? ~' =. This means that for any two nonnegative numbers a and 
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b, we have the inequality 


a oP 
2 ab< —+—. 
(2) ip 
Now let a,,...,a@, and b,,...,5, be arbitrary nonnegative numbers. 
If, say, b, = --- = b, = 0, then the inequality (1) holds. We can therefore 


assume that 4 = (a? +---+a?)'? 40 and B= (oP + ee BP MP #0. 
Put x, =a,/A, y, = 5,/B. In view of (2), we have 


P bP 


a 
<—& k =l,...,%. 
XY SDAP * Tipe’ k=l,...,17 


Adding these inequalities and bearing in mind that 1/p +1/p’ = 1 and that 
av+---+a? = A’, bP +--+ 5? = B? , we end up with the required 
inequality 


XY, te +x,y, Slab + --+a,b, <5 AB=> a,b, +---+a,b 


< (ar ep aP PEP 4 BP), 


The topic of exact inequalities is unusually extensive. Many books and 
papers deal with such inequalities. One of the best known is the book [7] by 
Hardy, Littlewood, and Polya. This topic is also well covered in the popular 
literature. 

As a rule, using the general methods that we will talk about in the second 
half of this book, it is possible to prove many exact inequalities without 
difficulty. But there are exceptions. Here are two problems that are easy to 
state, but proving their respective extremal properties is not, I think, a simple 
matter. The reader should try to solve them. He may hit on some simple 
solutions. 


n 


PROBLEM 1. Find the least value of the sum of the fourth powers of an odd 
number of quantities x, ,..., X3,,, given that their sum and the sum of their 
cubes are both zero and the sum of their squares is one. 


PROBLEM 2. A hundred positive numbers x,,..., Xygq Satisfy the condi- 


tions x? aS bara > 10000, x, +--+ + Xj99 < 300. Show that there are 
among them three numbers whose sum is greater than 100. 


The Sixth Story 
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Kepler’s Problem 


Near a maximum the decrements on both sides are 
in the beginning only imperceptible. 


J. Kepler 


Goethe wrote that “when you confront Kepler’s life story with what he 
became and what he achieved, you are at once joyfully astounded and con- 
vinced that true genius is bound to overcome all obstacles.” This story depicts 
one of the most radiant, noble, and exalted geniuses that has ever existed. 
Since I am not in a position to illuminate this remarkable personality with 
a measure of thoroughness, I must refer the refer to two books on Kepler.' 
In his book [8R], Predtecenskii describes with rare beauty Kepler’s moral 
eminence. In his recent book Johann Kepler [3R], Belyi describes in detail 
Kepler’s scientific progress and his genius. Belyi’s book also includes an ex- 
tensive bibliography. 

It seems to have been Kepler’s fate to be spared no trial. He endured 
poverty, privation, sickness, the death of loved ones, upheavals, and exile. 
And yet, when we read him, we are invariably conscious of his thankfulness 
to fate for the gift of joy and happiness, the joy of labor, and the pursuit of 
truth. This is how he rhapsodized about his third law of planetary motions: 


I yield freely to the sacred frenzy; I dare frankly to confess 
that I have stolen the golden vessels of the Egyptians to build 
a tabernacle for my God far from the bounds of Egypt. If 
you pardon me, I shall rejoice; if you reproach me, I shall 


'Two English books on Kepler are: J. Banville, Kepler. A novel, Secker and Warburg, London, 
1981, and A. Koestler, The sleepwalkers, Hutchinson, London, 1959; Penguin, New York, 1964. 
(A.S.) 
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endure. The die is cast, and I am writing the book—to be 
read either now or by posterity, it matters not. It can wait a 
century for a reader, as God himself has waited six thousand 
years for a witness. 


Kepler seems to have derived much the same joy from the discoveries of 
others. This is how he writes to Galileo when expressing delight at the latter’s 
discovery of the satellites of Jupiter: 


I stayed home, did nothing, and thought of you, dear and 
famous Galileo when I suddenly learned of your discovery 
of four planets with the aid of the telescope... I could not 
think without extreme agitation that in this way our ancient 
argument was resolved... I may seem somewhat daring if I 
so readily trust your claims unsupported by any of my own 
tests. But why should I not believe the most learned of math- 
ematicians whose correctness is confirmed by the very mode 
of his reasoning. 


Kepler referred to Snel as the Apollonius of his time. When he ran into 
difficulties in solving geometric problems, he addressed Snel thus: 


Produce for us, oh Snel, glory of the geometers of our time, 
solutions of this and other problems that are now required. 


Kepler was utterly convinced that any persons pursuing the truth will be 
happy to learn of its discovery. He addressed such people in these words: 


In some places it is necessary to dwell at length, so that ... 
learned men would know what to profit by and what to enjoy. 


Predtecenskii writes of Kepler: 


He is forever unaffected and true to himself. Conceit and 
ambition are foreign to his lofty mind. He sought neither 
honors nor praise. He never claims to be superior to schol- 
ars that are now virtually unknown, and all his life referred 
with profound respect to Maestlin, whose sole distinction is 
that he had the good fortune of having Kepler for a student 

Tycho Brahe ... was his chief antagonist, for he re- 
jected the Copernican theory so zealously advocated by Ke- 
pler. We know that the relations between the two great men 
were marred by many unpleasant incidents. And yet Kepler 
invariably praises Tycho, gives him his due, and makes no 
attempt to diminish his merits ... Here, and at all times, 
Kepler shows himself to be a champion of the truth. Sad to 
say, this is all too rare ... in our own time. 
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Kepler had every reason to say of himself, “I am used to telling the truth 
everywhere and at all times.” 

The theme of genius is not well developed in Russian literature. It was 
not dealt with by either Tolstoy or Dostoevsky. Pushkin is an exception. 
The word “genius” is articulated in Mozart and Salieri, but most of the time 
Pushkin uses the more inclusive term “Poet.” 

Artlessness, loyalty in friendship, the creative urge, the ability to delight 
in all that is beautiful and, of course, an inability to do evil—all these are 
characteristics of genius that Pushkin bestowed on his Mozart and that were 
also markedly present in Kepler. Like the Poet, he was invariably guided by 
the motto: “Follow the free mind wherever it leads.” Just as “the Poet alone 
chooses the subjects of his poems,” so too Kepler chose the subjects of his 
researches. 

This story is devoted to one such subject. 

In his book, New solid geometry of wine barrels, Kepler describes an event 
in his life that occurred in the fall of 1613: 


In December of last year ... I brought home a new wife 
at a time when Austria, having brought in a bumper crop 
of noble grapes, distributed its riches ... The shore in Linz 
was heaped with wine barrels that sold at a reasonable price 

That is why a number of barrels were brought to my 
house and placed in a row, and four days later the salesman 
came and measured all the tubs, without distinction, without 
paying attention to the shape, without any thought or com- 
putation. Namely the copper point of a ruler was pushed 
through the filling hole of a barrel, across the heel of each of 
the wooden disks which we refer to simply as bottoms, and 
as soon as the length to the point at the top of one board 
disk was the same as the length to the point at the bottom of 
the other, the salesman stated the number of amphoras con- 
tained in the barrel after merely noting the number on the 
ruler at the spot where the length in question ended. I was 
astonished ... 


Kepler thought it strange that by means of a single measurement (see Fig- 
ure 6.1 on page 50, taken from Kepler’s book), one could determine the 
volumes of barrels of different shape. He goes on: 


Like a bridegroom, I thought it proper to take up a new sub- 
ject of mathematical studies and to investigate the geometric 
laws of a measurement so useful in housekeeping, and to clar- 
ify its basis if such exists. 


In order to clarify a basis of this kind Kepler had to lay the foundations of 
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FIGURE 6.2 


differential and integral calculus, as well as advance new ideas for the solution 
of maximum and minimum problems. 

The key result in the book New solid geometry of wine barrels is Theorem 
V [Part Two]: “Of all cylinders with the same diagonal, the largest and most 
capacious is that in which the ratio of the base diameter to the height is V2.” 
In other words, this theorem provides the solution of the following problem: 
Inscribe in a given sphere a cylinder of maximal volume. The corresponding 
problem in the plane is fo inscribe in a given circle a rectangle of maximal 
area. Hereafter we will call the first of these problems Kepler’s problem and 
the second Kepler’s planimetric problem. 

To begin, we solve Kepler’s problem by a method that Tartaglia would 
have employed (had he posed it). Let R be the radius of the sphere. We 
denote by x half the height of the cylinder. (See Figure 6.2.) Then the 
base radius is \/R? — x” and the volume of the cylinder is 2n(R? - x?)x. 
We recall that in Tartaglia’s case we had (S a x?)x/4. The formula in the 
preceding story yields the maximum value < = R//3 and 


2 
a | 2 Ro _ 2 


This implies that in the extremal cylinder the ratio of the base diameter to 
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FIGURE 6.3 FIGURE 6.4 


the height is /2. This coincides with Kepler’s result. 

In solving this problem Kepler could have used his idea of the insensitiv- 
ity of the variation of a function near its maximum (see the epigraph for 
this story). But he ignored this possibility and provided a purely geometric 
solution. 

Kepler reduced the problem of the most capacious cylinder to the solution 
of the following maximum problem: Of all rectangular parallelepipeds with 
a square base inscribed in a sphere the one with largest volume is the cube.” 
This result is proved in Theorem IV of Part Two of Kepler’s book. 

For brevity’s sake, Kepler called a rectangular parallelepiped with square 
base a post, and we will do likewise. We distinguish two cases: (a) the post 
is higher than the cube; and (b) the post is lower than the cube. 

Let’s look first into case (a). (See Figures 6.3 and 6.4.) Consider the cube 
ABCDEFGH and the “post” A’B'C'D'E'F'G'H' inscribed in the same 
sphere (the points D, H, and H’ are invisible in Figure 6.3). We compare 
their volumes. Two parallelepipeds with square bases protrude from the cube, 
namely 4’B’C'D' A" B"C"D" above it and one of equal volume below it. 

But far more can be “subtracted” from the cube. This is easy to see: At 
each side of the square A’ B’C"D" there is a parallelepiped that borders 
on the post whose base is a square congruent to A”B”C"D". We denote 
one of these parallelepipeds by A’ B’ QRMNPL. The volume of these four 
parallelepipeds alone exceeds that of the protruding parts of the post. In 
fact, the volume of the protruding parts is equal to 2|4”B"|?|A"A’| and the 
volume of the bordering parallelepipeds is 4|4"B"||A" M|. But |A”M| = 
A" Al/V2. oa 

Now we consider the triangle A’AA” (Figure 6.4). The angle a = A’AA” 
subtends the arc A’C and the angle ~ = AEC subtends the larger arc 


2 This is a special case of the problem discussed in the previous story. 
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AC. This means that a < #, and thus |A"4'|/|A” A| = tana < tanfB = 
|CA|/|AE| = V2. It follows that 


214" B" 1A" A'| < 2|4"B"|A" Al V2 
2 


os 2|A"B" |? J2V2|4" MI z 4|A" B" | |A"M|. 


This proves the required inequality of volumes. 

It remains to consider the case (b). (See Figure 6.5.) Again, let the cube 
ABCDEFGH and the post A’B’'C'D'E' F'G'H' be inscribed in the same 
sphere (the points H, D’, and H’ are invisible in Figure 6.5). We compare 
their volumes. The combined volume of the two parallelepipeds with square 
base that protrude from the post is 2|AB|*|4A"|. The volume of the part 
of the post that protrudes from the cube is less than this combined volume. 
Kepler proves this in the following manner. 

Let us, says Kepler, attach to each lateral face of the cube a parallelepiped, 
or panel, of the same thickness as the protruding part of the post. (One 
such panel, namely ABFEPQRS, is shown in Figure 6.5). Their combined 
volume is 4|AB|’|AP| = 4|AB|?|A”A'/V2. Again, by considering the tri- 
angles AA‘A” and AEC (in Figure 6.4), we show that angle a is greater 
than angle #, so that |AA"|/|A"A'| > V2. But, as Kepler correctly ob- 
serves, even if we attach the four panels to the cube, parts of the post re- 
main exposed. They are four “gaping” parallelepipeds at the edges of the 
post (one such is B” LB’'OF"MF'N shown in Figure 6.5 (F" is invisible)). 
Each of these is part of a little post erected at one of the edges AF, BF, 
CG, and DH of the cube. The volume of each of these four little posts is 
|ABI|AP|? . Now the panels we applied earlier to the lateral faces of the cube 
stick out beyond the height of the post by eight little panels (one of which is 
BQOPATA" B'L in Figure 6.5). The volume of each of these eight little pan- 
els is |AB||AP||AA”|. The inequality 2|4A"| > 2V2|A”A'| = 4|AP| implies 
that the volume of the four little posts at the edges of the cube is less than the 
volume of the eight little panels. Thus the volume that protrudes from the 
post is 2|AB|*|4A”| > 4|AB|?|AP|, while the volume that protrudes from the 


KEPLER'S PROBLEM 53 


cube is less than 4|AB|?|AP| — 8|ABI|API||AA"| + 4|ABI|AP|? < 4|ABl?|API. 
It follows that in the transition from cube to post, the cube loses more than 
it gains. This completes the solution of the auxiliary problem. (We recall 
that in the previous story we investigated a somewhat more general problem 
by algebraic means.) 

The rest of the proof is very simple. In every cylinder one can inscribe a 
post and the ratio of their volumes—in this order—is constant and equal to 
m/2 (check this). This means that the cylinder of largest volume inscribed in 
a sphere is the one in which we can inscribe a cube. And in such a cylinder 
the ratio of base diameter to height is V2. 

After proving this theorem Kepler wrote: 


From this it is clear that, when making a barrel, Austrian 
barrelmakers, as if guided by common and geometric sense, 
take as the radius of a bottom a third of the length of a 
stave. When this is done, the cylinder constructed in the 
mind between two bottoms will consist of two halves, each 
of which will be close to the conditions of theorem V and will 
thus have maximal capacity even if one deviated somewhat 
from the exact rules during the making of the barrel, because 
figures closed to the optimal change their capacity very little 

. This is so because near a maximum the decrements on 
both sides are in the beginning only imperceptible. 


Kepler’s concluding words contain the fundamental algorithm for finding 
extrema that was later shaped into an exact theorem. First described (for 
polynomials) by Fermat (1629) and then, in general form, by Newton and 
Leibniz, this algorithm was later called “Fermat’s theorem.” A great deal of 
interesting information about Kepler’s problem can also be found in M. B. 
Balk’s article, “The secret of the old barrelmaker” (Kvant, 1986, 8, p. 14, in 
Russian). 


The Seventh Story 


7 


The Brachistochrone 


If one considers motions with the same initial and 
terminal points then, the shortest distance between 
them being a straight line, one might think that the 
motion along it needs least time. It turns out that 
this is not so. 


Galileo Galilei 


The profound significance of well-posed problems for 
the advancement of mathematical science is undeni- 
able. 


D. Hilbert 


Acta Eruditorum, the first scientific journal, began publication in 1682. In 
the June 1696 issue of this journal, there appeared a note by the famous 
Swiss scholar Johann Bernoulli with the intriguing title, “A new problem that 
mathematicians are invited to solve.” 

It is often the case that the statement of a new problem attracts the atten- 
tion of many eminent scholars. By competing with one another they create 
powerful methods for the solution of problems that later offer great service 
to science. This was the case with Johann Bernoulli’s problem. Its author 
stated it as follows: 

Let two points A and B (Figure 7.1 on page 56) be given in a vertical 
plane. Find the curve that a point M , moving on a path AMB must follow 
such that, starting from A, it reaches B in the shortest time under its own 
gravity. 
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When posing his problem, Bernoulli made no mention of Galileo. How 
unfair! All modern natural science “issued” from Galileo. Not only did he 
discover the fundamental laws of mechanics, but Galileo also was the first 
to put questions to Nature. The present stage of the development of science 
began when Galileo ascended the tower of Pisa to ask Nature about the laws 
of falling bodies. 

Galileo experimented with inclined planes and, apparently, also with cir- 
cular chutes. We quote from Discourses on mechanics, his life’s main work: 


Experience shows that bodies falling down circular arcs cor- 
responding to chords inclined with respect to the horizon ... 
perform motions that aiso take equal time intervals, shorter 
than those for motions along the chords. 


Of Galileo’s two assertions on motions along circular arcs, only one is true: 
a motion along an arc is faster than one along a chord. The claim about the 
equality of time intervals is only approximately correct, and, as it turned out 
later, this fact is intimately related to Bernoulli’s problem. 

Be that as it may, Galileo’s assertion, and his assertion that serves as an 
epigraph for this story, both must face Bernoulli’s question: which curve 
corresponds to the shortest time interval, that is, which curve is the brachis- 
tochrone (Greek for quickest)? Many authors upbraid Galileo for having 
mistakenly claimed that a circular arc is a brachistochrone. In the Discourses 
Galileo returned on a number of occasions to the topic of comparing motion 
on a circle with motion on a chord, but in no place can his words be inter- 
preted as the claim that among all curves joining two points, motion along a 
circular arc is shortest. However, it is conceivable that we have overlooked 
some such pronouncement of his. 

Many mathematicians responded to Johann Bernoulli’s “invitation.” One 
of the first to solve the brachistochrone problem was Leibniz, to whom the 
problem appealed and who called it “splendid.” Next Jakob Bernoulli (Jo- 
hann’s brother) and |’Hospital announced their success. And, of course, Jo- 
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hann Bernoulli himself had a solution. All of these scholars made signifi- 
cant contributions to the emerging new school, that of mathematical analy- 
sis. There was also an anonymous solution identified by experts as provided 
by Newton (who later admitted that it took him 12 hours of uninterrupted 
reflection to arrive at a solution). Ex ungue leonem (tell a lion by his claw) 
was Johann Bernoulli’s comment on Newton’s solution. 

All of these mathematicians arrived at the same conclusion. The brachis- 
tochrone is the cycloid. At this point it is appropriate to examine this remark- 
able curve. 

A cycloid is the path described by a point on a circle that rolls without 
sliding on a straight line. Let’s derive its equation. 

Let / be a horizontal line and let a circle with radius R and center O 
roll along /. Suppose that at time zero the point to be observed is the point 
of contact of the circle and the line /. We denote it by A,. Consider the 
rectangular coordinate system with A, as origin and / as x-axis (Figure 7.2). 
We wish to determine the position of A) following a clockwise rotation of 
the circle through g. To this end we mark on the original circle the point A ‘ 
such that the angle A gPA is g. When the circle will have turned through 
an angle g, A 5 will be the new point of contact with the line /. Since the 
length of the arc from A, to A, is Rg, this will be the abscissa of the new 
position of the center of the circle. The new position of the point A) will 
be such that the “new version” of the angle A,OA ; is again yg. Hence, the 
coordinates (x(% ), y(Yp)) of A, will be 


x(9) = R(g —sing), y(g) = R(1 -cosg). 


This, then, is the equation of the cycloid that passes through the origin for 
gy = 0. In general, we have an additional parameter 


(1) x(9) = R(g—sing)+C,, y(y) = R(1 —cosg). 


What is so remarkable about this curve? How did it arise? 

The cycloid first turned up in the works of Galileo as an illustrative ex- 
ample. He called it cycloid, meaning “circle-related.” The curve was soon 
rediscovered in France (by Mersenne, Roberval, Descartes, and Pascal) and 
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named a roulette or a trochoid. The first marvel that involved it was that it 
became a kind of firing range for trying out new forms of the weapons that 
subsequently entered the arsenal of mathematical analysis. 

Ancient mathematics bequeathed very few curves to subsequent genera- 
tions. The main curves studied in antiquity were the circle and the conics, 
that is, the ellipse, the hyperbola, and the parabola, curves which turned up 
in the works of Appolonius. We should also mention the quadratrix, the 
cissoid, the conchoid, and the spiral. Luckily, the first laws of mechanics 
did not go beyond this supply of curves; the planets move along ellipses, and 
thrown objects describe parabolic arcs. 

The best mathematicians of the seventeenth century (including, in addition 
to those named previously, Viviani, Torricelli, and some others) perfected 
their new methods of investigation on the cycloid; they obtained tangents to 
it, determined areas under it, computed the length of its arcs, and so on. 

Then came the second marvel. The cycloid became the first “nonancient” 
curve connected with the laws of nature. It turned out that the cycloid, and 
not—as Galileo wrote—the circle, has the property that a body that glides 
along it without friction oscillates with a period unaffected by its initial posi- 
tion. This tautochrone (equal-time) property of the cycloid was discovered by 
Huygens, and produced a long-lasting sensation. Huygens himself wrote that 
“The most desirable fruit, a kind of high point of Galileo’s teaching about 
falling bodies, is my discovery of the property of the cycloid.” This was the 
second appearance of the cycloid in a completely new context. 

Let’s now solve the problem. Recall that there were five solutions due, 
respectively, to Johann Bernoulli, Leibniz, Jakob Bernoulli, |’?Hospital, and 
Newton. All of them were of great interest. Leibniz used a method which 
was further developed by Euler (its essence can be surmised from Leibniz’s 
letter to Johann Bernoulli quoted in the sequel). Nowadays, the Liebniz- 
Euler method is one of the basic methods for the solution of problems on 
maxima and minima and is known as the direct method of the calculus of 
variations. Jakob Bernoulli based his solution on Huygens’ principle and 
thus took another step toward the creation of the Hamilton-Jacobi theory 
(mentioned briefly in the third story). But the most popular solution has 
been that found by the author of the problem. It has been reproduced in 
countless books, and we too will reproduce it here. 

First, introduce in the plane a rectangular coordinate system with horizon- 
tal x-axis and downward-directed y-axis. We place the point A at the origin 
(Figure 7.1). Let y = f(x) be the equation of the curve (chute) joining the 
point A to the point B with coordinates (a, b). We must now determine 
the time it takes a body M of mass m to fall (without friction) from A 
to B along the chute f(x). From mechanics, we know Galileo’s law which 
asserts that the velocity of a body at a point with coordinates (x, f(x)), in 
a frictionless motion under gravity, is independent of the form of the curve 
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joining A to (x, f(x)) and depends solely on the ordinate f(x). In the 
words of Johann Bernoulli, “The velocities of falling weighted bodies are to 
each other as the square roots of the traversed altitudes.” 

In fact, the kinetic energy of the body at (x, f(x)) is mv?/2 and is 
equal to the difference mg f(x) of the potential energies. In sum, the velocity 
at (x, f(x)) is /2gf(x), where g is the acceleration due to gravity. 
Next we consider the portion of the path between the points (x, f(x)) 
and (x +dx, f(x+dx)), where dx is a small increment of the abscissa. 
The length ds of this portion of the path is approximately equal to 


Vax? + (f(x + dx) — f(x))*. Using the approximate equality f(x + dx) — 


S(x) = f'(x)dx, we have ds = V14+(F'(x)) dx. On a small portion 
of the path, it is reasonable to suppose the velocity constant and equal to 
V 2g f(x). This means that the time required to traverse it is approximately 
1+(f' (x 
28 f(x) 
A to B is given by the integral 


Q) r- [Vo Vi+ Fy a 


This leads to the following analytic eee is ne brachistochrone problem. To 


find the minimum of the integral {> (v 1+ (Fo) V2) dx over all 
functions f with /(0)=0, f(a) = 

We have expressed our problem in mathematical language—more specif- 
ically, in the language of the integral calculus. This procedure is called the 
jormalization of the problem. (More will be said about this matter in the 
tenth story.) In the Soviet Union, the elements of integral calculus are now 
taught in the last (tenth) year of school. Since this book is intended for a 
larger audience of high school students and not just for those who have com- 
pleted the tenth class, we will carry out our derivation without the aid of the 
calculus, in the spirit of seventeenth-century mathematics. In this connec- 
tion we recall that the basic notions of analysis were introduced just 12 years 
before Johann Bernoulli’s paper, and that the period of “rigor” was as yet in 
the distant future. 

We divide the segment [0, 5] of the ordinate axis into n parts by means 


equal to dt = dx, and that the total time J of the motion from 


of the points 0 = yo; ¥,,V2.---5¥, = b and find abscissas x, such that 
I(X1) =Yy. S(%y) = IQs +++ L(%-1) =Vy-p> X, = @- We join the points 
(x,y) and (%,,1,Vi4,)), § =0,1,...,2—1, by line segments. In this 


way, in addition to the function y = f(x), we obtain a broken line L, that 
also joins A and B. (See Figure 7.3 on page 60.) The greater the number 
n, the closer this broken line approximates the function y = f(x). Also, the 
gliding time of the body ™ along this broken line will be close to its gliding 
time along the chute given by the function y = f(x). 
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FIGURE 7.3 


We may suppose the velocity on the ith segment constant and equal to 
28y;,;, | = 0,1,...,2—1. With this assumption, the exact time of 
traversal of the 7-link broken line is 


(2.) T= VitxX i (¥5 —¥4)? + (%— 44)? 
a e V28y, £y, 
2 


(y, — Vy) + (X —X,-1) 


V28Y,, 
and this time is to be minimized. 

The limit of 7, as n — oo is the required time of motion of the body 
M along the curve y = f(x). This limit coincides with the previously intro- 
duced integral (2). But we will not solve the problem formulated earlier by 
means of integral calculus. Rather, following Johann Bernoulli, we will solve 
the approximate “discrete” problem corresponding to (2,). Its formulation 
follows. 

Given two points A and B with respective coordinates (0,0) and (a, b). 
On lines 1,,...,1,, parallel to the x-axis and having respective ordinates 
Voss 9 Yq» find points D, = (X,,Y,). +++ D,-\ = (X,-1>Y_—1) such that 
the sum T, is minimal (assuming that x) =y) =0, x, =a, and y, =b). 

To solve this problem, Johann Bernoulli applied a remarkable method that 
has strongly influenced all of the subsequent history of the natural sciences. 
It is this method that we want to discuss next. 

The optical-mechanical analogy. Let’s go back to the third story. In that 
story we formulated and solved the problem of refraction of light. Now we 
will thoroughly investigate its content and compare it with the problem (2,) 
for n = 2. In both cases we are given two points and a horizontal line and 
must find on that line a point so as to minimize the sum of “weighted” lengths. 


teeet 
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The difference in this case is that, whereas in the third story the velocities v, 
and v, were arbitrary and we essentially minimized the function 


% vis x, ° anette 
1 2 


(for A = (0,0), B=(a,b), and / given by y = y,), here we are dealing 
with a special case, namely we are required to minimize the function 


a") Vy +x : (a—x,)?+(b-y,)? 
gy, V2gb 
Leibniz arrived at the same conclusion and then went his own way. He gave 


a wonderful description of his method in his letter of June 16, 1696, to 
Bernoulli: 


My method is somewhat different from yours but leads to the 
same result; to respond, as is just, to your openness with the 
same, here it is in a few words: upon replacing the curve with 
a polygon with infinitely many sides I see that of all possible 
cases (the curve) of quickest descent will be obtained if we 
choose on the broken line any three points, or vertices, A, C 
and 8B, and C is such that of all possible points located on 
the horizontal line / it alone yields the quickest path from 
A to B. In this way the task is reduced to the solution of an 
easy problem: given two points A and B and a horizontal 
line 7 between them; find on that line the point C such that 
the path ACB is quickest. 


(We have changed Leibniz’s notation slightly, to fit that used in this text; 
he has B instead of C, C instead of B, and DE instead of /.) 

Johann Bernoulli proceeded differently. Imagine a nonhomogeneous op- 
tical medium consisting of m homogeneous layers s,,...,5, (see Figure 
7.3), say, n sheets of different kinds of glass. Let the speed of propagation 
of light in the sheet s, be \/2gy,, in the sheet s,, \/2gy,,... , and in the 
sheet s,, V28Y, . What will be the time for the propagation of light if it is 
forced to move along the broken line L,? Of course, the answer is 7, as 
given by formula (2,). In other words, light, according to Fermat’s principle 
discussed in the third story, “solves” the very problem (2,) (if we stipulate 
that the velocity of propagation of light in the ith layer is \/2gy, . 

Beginning with a mechanical problem and following Johann Bernoulli we 
have arrived at an optical problem. This is the first application of the optical- 
mechanical analogy that was to yield so many discoveries in the works of 
Hamilton, Jacobi, de Broglie, and many others. 

Now we will deal with Johann Bernoulli’s solution of the brachistochrone 
problem. We apply Snel’s law (discussed in the third story) to the optical 
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variant of the problem (2,). Let a; be the incidence angle in the ith layer. 
By Snel’s law, 


sina, sin ay sin a, 
(4) eee ee = constant. 
8, 8Y2 Vv 22y,, 


If we allow the layers to grow thinner and more numerous then, in the 

limit, (4) yields 
sina(x) 
(5) OFFIC) constant, 
where a(x) is the angle between the tangent to the curve y = f(x) at the 
point (x, f(x)) and the y-axis. The tangent of this angle is 1/f‘(x). I 
follows that f(x) = tan(z/2 —a(x)) = cosa(x)/sina(x), so that 
1 


V4 (F009) 
This relation and (5) imply that 
V14+ (F(x)? Vf) = 


where D is some constant. In other words, the function must satisfy the 
differential equation 


sina(x) = 


(6) 1a Y eee 


In Bernoulli’s time it was known that (6) is the differential equation of a 
cycloid. 
For those with a measure of experience in integration, we integrate the 
equation (6). 
t es VV GY_ dy 
y= =dx. 
opr 
We make the substitution y = C sin’ (¢/2) = C(1 —cost)/2. Then 
VE sin(t/2)d(C sin?(t/2)) 
VC cos(t/2) 


Integrating the latter relation we obtain 


dx = = Csin*(t/2) dt = C(1 —cost) dt/2. 


(7) x= S(e-sint)+G,, y= S(1— cost). 


If we replace C/2 by R and ¢ by g in(7), then we obtain the equation 
(1) for a cycloid. 

These formulas yield a simple prescription for constructing the cycloid that 
solves Johann Bernoulli’s problem. Note that all the cycloids (7) are similar 
as well as convex. For example, take any cycloid in (7) that has the point 
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FIGURE 7.4 


A = (0,0) as its left vertex. (See Figure 7.4.) Denote by B’ the point of 
intersection of the line AB and the selected cycloid. To obtain the required 
cycloid we need only apply to the selected cycloid the similarity with center 
(0, 0) and coefficient |AB|/|AB’ | . 

The solution of the brachistochrone problem gave its author the tremen- 
dous joy of original discovery. Bernoulli said: 


I cannot refrain from expressing once more my amazement at 
the noted unexpected identity of Huygen’s tautochrone and 
our brachistochrone ... Nature always operates in the sim- 
plest manner. Thus in this case it renders two different ser- 
vices by means of one and the same curve. 


Johann Bernoulli’s method made possible the solution of a number of 
other remarkable problems in optics, mechanics, and geometry. Let’s look at 
two such problems. 


PROBLEM |. What are the trajectories of light rays in an atmosphere for 
which the velocity of propagation of light is proportional to the altitude? 


This problem was investigated by l’Hospital, author of the first textbook 
of analysis in history. (Alas, it was recently discovered that the essential part 
of this textbook is an edited version of the lectures given to |l’Hospital by 
the very same Johann Bernoulli.) L’Hospital succeeded in integrating the 
equation and in answering the question. After approximately 200 years, it 
turned out that the solution to Problem | is directly related to Lobacevskian 
geometry—the trajectories of light rays coincide with Lobacevskian straight 
lines in the Poincaré model. 


PROBLEM 2. Find the minimal surface of revolution. 


Here we must add a few words to clarify the problem. Let y = f(x) be 
a nonnegative function passing through two points (x), ¥)) and (x,, y,) in 
the plane. Rotating it about the x-axis yields a surface of revolution. Its area 
is given by the integral 


s=2n [" foo + (feyax. 


The task is to find the curve y = f(x) that minimizes the area S. 
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The problem of the minimal surface was solved by Johann Bernoulli and 
by Leibniz. The reader should try to solve these two problems himself. I will 
discuss these two problems further in the fourteenth story. 

The brachistochrone problem was destined to play an important role in 
mathematical analysis. In fact, it turned out to be the first of a series of 
problems underlying the formulation of the calculus of variations. 

Shortly before the brachistochrone problem emerged, Newton considered 
a similar problem. Newton’s problem is the subject of the next story. 


The Eighth Story 


8 


Newton’s Aerodynamical Problem 


This book (Newton’s Principia) will forever remain a 
monument to the profundity of a genius. 


P. Laplace 


Do geniuses make mistakes? Usually, anyone who asks such a question 
expects an affirmative answer. There is an element of comfort in the knowl- 
edge that even geniuses can err. Sometimes their mistakes elicit comments 
marked by less than benevolent enthusiasm. Newton too was on the receiv- 
ing end of this kind of enthusiasm. Thus, in a book on optimal control, you 
can read that 


Newton formulated a variational problem dealing with a solid 
of revolution that offers least resistance to a gas. He assumed 
a physically absurd law of resistance. As a result, the prob- 
lem he posed has no solution (the more serrated the profile, 
the smaller the resistance) ... Had Newton’s arguments been 
even approximately correct, we would not need expensive 
wind tunnel experiments today. 


What is the problem in question, and how justified is the quoted criticism? 
This story will look at just these issues. 

Newton’s Mathematical principles of natural philosophy appeared in 1687. 
No other work in the mathematical literature can be compared with it. A 
description of the system of the universe discovered by Newton, it contains 
the kind of discovery that can be made only once. Lagrange called it “the 
greatest work of the human mind,” and Laplace lauded it as “a monument 
to the profundity of genius.” 
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The book deals with the basic laws of mechanics discovered by Newton, as 
well as the laws of planetary motion and other fundamental facts. However, 
part of the text is devoted to many special problems. 

While discussing the resistance offered to material bodies by the medium 
through which they move, Newton tossed off, as if in passing, the following 
phrase: 


If the figure DN FG is such a curve, that if, from any point 
thereof, as N, the perpendicular NM let fall on the axis 
AB , and from the given point G there be drawn the right line 
GR parallel to a right line touching the figure in N, and cut- 
ting the axis produced in R, MN becomes to GR as GR? 
to 4BR-GB’, the solid described by the revolution of this 
figure about its axis AB, moving in the before-mentioned 
rare medium from A towards B, will be less resisted than 
any other circular solid whatsoever, described of the same 
length and breadth. 

Principia, v. 1 pp. 334-334 


This phrase attracted the attention of Newton’s contemporaries some nine 
years later, in 1696, when Johann Bernouili posed his brachistochrone prob- 
lem (discussed in our seventh story). 

While the brachistochrone problem elicited universal admiration, New- 
ton’s problem fared like poor, neglected Cinderella. As a rule, it was brought 
up—if at all—as an instance of the error of genius. But just as Cinderella’s 
day came, so did the day of Newton’s problem. 
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Let’s try to understand Newton’s idea. 

When constructing shells, torpedos, or rockets, one tries to shape them so 
as to minimize the resistance they will meet while in motion. Newton writes: 
“figures may be compared together as to their resistance; and those may be 
found which are most apt to continue their motions in resisting mediums.” 
There is just so much symmetry we can give to a boat or a plane, for example. 
But when it comes to the head of a shell, torpedo, or rocket, it stands to reason 
tht their cross-sections must be circular; in other words, they must take the 
shape of a solid of revolution. But which solid of revolution? A sphere, a 
cone, a spindle, or yet another circular shape? Such questions cannot be 
answered without computations, without solving a maximum or minimum 
problem. Newton posed just such a problem (and gave its solution in the 
phrase quoted above). 

The very first approximation to the problem follows. 


PROBLEM. Find the solid of revolution of given length and width that is 
subject to least resistance while moving in some medium. 


A few clarifying remarks are in order. The terms used in the formulation of 
the problem (length, width, motion, and medium) require precise description. 

Newton assumes that the front and back of the solid of revolution are the 
same, that is, that the solid is symmetric with respect to the plane passing 
through the midpoint of the axis of revolution and perpendicular to it. Thus 
the length of the solid is its length measured along its axis of rotation, and 
its width is the radius of its middle section. This being so, it is clear that it 
suffices to consider half the solid. This is what we will do in the sequel. 

Now we consider the motion. Following Newton, we will assume that the 
body is moving with constant velocity v. 

Finally, let’s look at the medium. This is the most delicate and central 
issue. Newton calls it a “rare” medium. He thinks of a rare medium as 
“consisting of equal particles freely disposed at equal distances from each 
other.” Each of the motionless particles has a fixed mass m and is a perfectly 
elastic ball. Newton assumed that the body itself is also perfectly elastic. 
This means that when one of the small balls collides with the moving body, 
it recoils in accordance with the law that “the angle of incidence is equal to 
the angle of reflection.” 

We could now pose the general problem and embark on its solution. How- 
ever, we will proceed differently. Our first step will be the solution of a 
simpler problem. (Newton himself solves this simpler problem first.) 


PROBLEM OF THE FRUSTUM OF A CONE. Determine the dimensions of the 
Jrustum of a cone subject to least resistance when moving in a rare medium 
given its base and altitude. 
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We will determine the resistance offered by the frustum of a cone moving 
in a rare medium. 

Let H be the altitude of the frustum and R the radius of the upper base. 
(See Figure 8.2; the boldface part of this figure appears in the Principia). 

Newton writes: 


For since the action of the medium upon the body is the 
same... whether the body move in a quiescent medium, or 
whether the particles of the medium impinge with the same 
velocity upon the quiescent body, let us consider the body as 
if it were quiescent. 

Principia, vol. 1, p. 331 


We will do the same, that is, we will assume that the frustum is at rest and 
that the medium “comes against it” from below with velocity v. 

The part of the surface of the frustum that is subjected to collisions with 
the particles of the medium is its lower base and side. First, we compute 
the resistance to which the lower base is subject. Let its radius be x. The 
particles that impinge on this base during a unit of time were originally in a 
cylinder whose base is the same as the lower base of the frustum and whose 
altitude is v . The volume Vy of this cylinder is nx*y. Let p be the density 
of the medium and m the mass of a single particle. Then the number of 
particles that impinge on the lower base of the frustum during a unit of time 
is N= 4% = Lax'v. Upon collision with the lower base, each particle 
reverses its velocity, so that its momentum increases by —2mu. By New- 
ton’s third law, the frustum gains an opposite increase of momentum. This 
means that its total gain of momentum—due to N, particles—is N, 2mv = 
Qnpx?v?. Analogous considerations apply to the side of the frustum. The 
side collides with particles contained in a hollow cylinder whose volume V, is 
n(R? - x*)y . The number of particles impinging on the side of the frustum 
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is N, =(p/m)V, = (p/m)nx(R? —x?)y . As for the change in momentum, we 
note that the gain in momentum of a particle reflected from the side of the 
frustum is m(v,—v,). (See Figure 8.3.) This vector must be projected on the 
y-axis. It is easy to see from Figure 8.3 that this projection is —2mu cos” ?, 
where g is the angle between a generator of the frustum and the plane of its 
lower base. It follows that the total gain of momentum due to the particles 
impinging on the side of the frustum is 


N,+2muv cos” g= 2np(R° - x°’)v? cos” 9. 
We will also note (for future use) the following result. Consider the frus- 
tum of a cone obtained by revolving about the y-axis a segment [AB] whose 


endpoints have abscissas a@ and b and that makes an angle g with the 
X-axis. This side of the frustum is subject to a force of resistance given by 


(1) F=K(b’—a’)cos'9, K=2npv’. 
It follows that the total resistance offered by the frustum is given by 
F(x)=K [x? + (R° - x’) cos” o| ; K= 2npv’. 


The expression for cosg in terms of x is 


cosy =(R—x)/\/(R-x)? +H’. 


Since the constant K has no effect on the behavior of maxima and minima, 
we Can ignore it. 

We can formalize the problem of the frustum of a cone as follows: 

PROBLEM |. Find the minimum of the function 

2 
2 2 2 (R—-X) 

(2 I(x) =x° + (Ro - x°)——— SS 
) R — x)? 4H? 
forO0<x<R. 


As a rule, a particular translation of a problem into the language of math- 
ematics is not unique. In the present case we can also give the following 
alternative description of our problem. 
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In Figure 8.2, extend the segment CF to the point S of its intersection 
with the y-axis and take the length of the segment OS as the new variable 
z. The similarity of the triangles SOC and SDF implies that x/R = 
(z-—H)/z, where R— xX = RH/z and x = R(z— HA )/z. Substituting these 
expressions in (2) we obtain a new expression for the force of resistance 
(divided by K ) in terms of the new variable. 


_ Pelcaer iol R? 
ze Re 
Here z varies from # to infinity. We have obtained the following formula- 


tion of the problem of the frustum of a cone (we can ignore the multiplicative 
constant R’) ‘ 


(3) g(Z) 


PROBLEM 1’. Find the minimum of the function 


(2') A(z) = ——hpt 


subject to the condition z>H. 


This problem is so simple that we can solve it without the use of the 
differential calculus. Our answer will take the same form as Newton’s answer. 
Let m be the least value of the function A, so that A(z) >m for z>H. 
Also, m < hA(H) = R?/(R? +H a) < 1. We make the obvious transformations 


h(z)>m forz >H 
(4) @ (z—-H) +R? — mz’? — mR’? >0 forz>H 
@ 2°(1—m)-2zH+H’+R(1—m)>0 forz>H. 

If it turns out that there is an m <1 for which the inequality (4) holds 
for all z, and if for this m wecan finda Zz > 4H that turns this inequality 
to an equality, then our problem will have been solved. We will try to find 
the required m and Z. 

If az? +2bz+c>0 forall z and az? + 2bz +c = 0, then it follows 
that D = b*—ac=0 and z= —b/a (why?). In our case, a=(1—m), b= 
—H,c=H*+R?(1—m). The equality D =0 yields 


D =H? -(1—m)(H? + R'(1—m)) > R'm? — (2R? + H’)m+ R= 0, 


2R? +H? -HV4R +H? 
2R? , 
We have discarded the “ + ” sign because m must be less than |. Further, 


(5) z=-b/a=H/(l—m) =\/R +(H/2) + 4/2(> H). 


We have also found m <1 and Zz > H such that the discriminant of the 
equation z?(1—m)-—2zH+ H’+ R(1 —m) = 0 is zero, and so the equation 


whence 
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has just one root Z > H, that is, A(z) > m and A(z) =m. Problem 1’ is 
solved. 
What follows is Newton’s answer to the problem of the frustum of a cone: 


As if upon the circular base CKBL [Figure 8.2] from the 
center O, with the radius OC, and the altitude OD, one 
would construct a frustum CBGF of acone, which should 
meet with less resistance than any other frustum constructed 
with the same base and altitude and going towards D in 
the direction of its axis: bisect the altitude OD in Q, and 
produce OQ to S, so that QS may be equal to QC, and 
S will be the vertex of the cone whose frustum is sought. 
Principia, vol. 1, p. 333 


It is easy to see that the content of Newton’s geometric answer is the same 
as that of the algebraic answer given by formula (5). 

The cone subject to least resistance is indeed blunt rather than pointed. 
Newton goes even further, letting the altitude H tend to zero. Then Zz 
tends to R, and the angle at the base of the cone tends to 45°. These facts 
prompt Newton to suggest replacing an oval body with a blunt one; more 
specifically, to place in front a circular disk that makes an angle of 135° 
with the adjoining surface. “This Proposition,” says Newton, “J conceive may 
be of use in the building of ships.” 


Newton’s problem for a broken line with two links. Let’s now take the next- 
to-last step in the solution of Newton’s problem. We propose to solve the 
special case of this problem for broken lines with two links for which the 
breaks are located on the line x = R/2 . In other words, we imitate Leibniz’s 
method. 

The precise statement of the present problem calls for finding a point A 
on the line x = R/2 such that the surface obtained by revolving the broken 
line OAB (where B has coordinates (R, H)) about the y-axis (Figure 8.4) 
is subject to least resistance in Newton’s rare medium. 

Suppose that the segment [OA] makes an angle g, with the x-axis and 
[OB] an angle gy. Let y denote the ordinate of 4. Using formula (1) we 
find that the solid of revolution generated by the broken two-link line OAB 
is subject to a force of 


F=K [(R/2)? cos” Qy + 3(R/2)° cos” 9,| . K=2npv’, 
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This means that we must minimize the function 


ie l ne 3 ; 
2p a+y? a? +(H-y)?’ 
here we have again neglected the multiplier KR‘ /24 and have set R/2 =a. 

It is easy to see that if |y| increases without bounds, then the function 
g, Stays positive and tends to zero. This means that the minimum of 8 
is zero, but this minimum is never attained (a similar situation occurs and 
will be analyzed in detail in the eleventh story). At first glance it may appear 
that our problem is pointless. However, physical considerations show that 
we must restrict y to the interval (0, H]. This important point calls for a 
clarification. 

If y <0, then we obtain a broken line OA’B, as shown in Figure 8.5. 
This broken line gives rise to a solid of revolution with a crater. Some of the 
small balls that constitute the rare medium would be reflected a number of 
times from the surface of the crater. The body would offer greater resistance, 
and that resistance would be governed by a different law. Thus the physics of 
the problem disallows negative values of y. Similarly, values of y greater 
than H must be ruled out. In other words, the implicit assumption is the 
monotonicity of the revolved curve. This leads to the following formalization 
of the problem of the broken line with two links. 


PROBLEM 2. Find the minimum of the function 

] 3 
@+y a +(H-yy 
subject to the restriction O< y<H. 


8 (y) = 


The solution of Problem 2—which is easy to obtain by means of the dif- 
ferential calculus—can be stated as follows: 

(a) There exists a 6 > 0, such that for 0 < H < 6 the minimum is 
attained at 0; 
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(b) For H > 6 the minimum is attained at an interior point of the interval 
[0, H]. Also, 


R/4tan gy cos" Y= 35 tang, cos’ 9). 


This means that for sufficiently small values of H the solid of revolution 
generated by atwo-link broken line is blunt (that is, is the frustum of a cone) 
and for larger values of H it is pointed (that is, it consists of a cone and the 
side of the frustum of a cone). 

Now let’s look at the last intermediate step in solving Newton’s problem. 


Newton’s problem for a broken line with 7 links. Consider the vertical 
lines /,, /,,... where respective equations are 


(see Figure 8.6), and, if R#k/n for some k, the vertical line x = R. We 
consider the totality of monotonically increasing broken lines with breaks 
on the lines /, joining (0,0) to (R, H). We will try to find that one of 
our curves which, when revolved about the y-axis, generates a solid of least 
resistance in a rare medium. For an admissible curve, let y, denote the 
ordinate of the vertex on the line /, , and let yg, denote the angle between 
the link joining the vertices on the lines /, and /,,, andthe x-axis. Assume, 
for the sake of simplicity, that R = N/n. By Formula (1), the body generated 
by revolving the curve about the y-axis is subject, in a rare medium, to the 
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force of resistance 


ax { Seo 99+ |(2)'- (2) Joos n+ 
” «|B (FY ] cot en} 


aed 
ey 


[cos” Pot 3cos’9, +--+ (2N — 1) cos” Py) 


K =2npv’, cos 9, = 2 


: V L/n? + Year — Ye)? 

Thus Newton’s problem for an N-link broken line is the problem of mini- 
mizing the force F for all choices of the (N—1)-tuples (y,,..., ¥y_,) with 
O<y, SY. S-°+ SVy_, <H. 

To solve this problem, we’ll make use of the experience acquired in solving 
Problem 2, for, in a sense, the problem for an N-link broken line is made up 
of N —1 problems like Problem 2. Indeed, assume that we have been able 
to solve the problem for an N-link broken line and that (j,,..., Py_,) are 
the ordinates of the vertices of the minimal broken line. Let us fix all but its 
k th vertex and choose its value so as to minimize the value of the resistance. 
This is the same as solving a variant of Problem 2 where it is required to find 
the minimum of the function 


1 
&,(y) = “al(2k ae 1) cos” Py) + (2k + 1) cos” %,] 


| 2k -1 2k +1 


nl bt-may t+ Oe — 
for },_; SY <_4,- The function g,(y) is made up of the (k — 1) th and 
the k th term of the sum (7) for the force F , except that the ordinate of the 
k th vertex is not fixed and is denoted by y (as usual, we have left out the 
multiplier K). It is clear that the solution of this problem is J), . 

The problem of minimizing the function g, is similar to Problem 2. It 
follows that, mutatis mutandis, the conclusions that hold for Problem 2 apply 
here. Specifically, 

(a) there exists a number 6, > 0 such that for p,,, —¥,_, < 6, the 
minimum is attained for , = ¥,_,3 

(b) for >, ,; —P,_1 > 6, we have the equality 


1 
(8) (x = 5) tang, _, cos’ P,.-1 = (x + 3) tang, cos" P- 


Together, (a) and (b) imply that the extremal broken line follows for a 
time the x-axis, ), =--- = », = 0, then goes up as per rule (8). The latter 
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means that the value of g,,, is obtained from the equality 


1 
Y= (s - 5) tang, cos* 9,= (s + ;) tang, | cos” Psat? 


the value of g,,, from the equality 


1 2 4 
Y= (s+ ;) tang,,, Cos” Pra. = (s+ j) tan 9, .2 608 P5425 


and so on. 

The force of resistance for an N-link broken line can be written directly 
in terms of the ordinates of the vertices of the broken line. To this end, we 
rewrite (7) as 
renee arent 

1/n? + (Veg - Y%) 

By making rather obvious changes, we obtain for F the expression 
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here Ay, = Vea Ve? Ax=1, 

Bearing in mind the form ‘of integral sums and the approximate equality 
(Ay, /Ax,) * if (x,), we see that as N tends to infinity, our sum tends to the 
integral 


' » R xdx 
ne a 2K f 1+(f'(x))? 


Solution of Newton’s problem. It can be shown that as N increases, the 
minimal N-link broken line tends to the minimal curve that is the solution 
of Newton’s problem. It follows (recall our description of the minimal N- 
link broken line) that the minimal curve is constructed as follows: at first 
the extremal function f(x) coincides with the x-axis, that is, f(x) =0 for 
0 < x < a, and then its values go up along some curve (Newton’s curve) 
subject to the condition 


(9) xX tan g(x)cos* g(x) = constant. 


Here g(x) denotes the angle between the tangent to the graph of the function 
y= f(x) at (x, f(x)) and the x-axis. The equality (9) is the limiting form 
of the equalities (8) associated with our solution of the N-link broken line 
problem. 

We will now rewrite (9). In view of the geometric sense of the derivative, 
we have tang(x) = f(x). Hence cos g(x) = [1 + ({*(x)) 2 ee But then 


x f(x) 
10 —————— 
a +S)? 


= Constant. 
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FIGURE 8.7 


Equation (10) is the differential equation of Newton’s curve. 

A final remark. At the moment when the horizontal curve goes over into 
Newton’s curve, the derivative of Newton’s curve is | (a datum that “... may 
be of use in the building of ships”). The differential equation (10) and the 
condition that the derivative of the curve at the break is 1 make it possible to 
find the equation of Newton’s curve. In the fourteenth story, we will integrate 
this equation. In the meantime we will simply present the formulas for the 
(x, y) coordinates of Newton’s curve y = f(x, c): 

1 3 | ee eee er Tc 
(11) xae(Z+2u4w), y= (logs +u +30’) -F. 
Here c is determined by the condition f(c) =H. 

A few questions remain. 

1. When all is said and done, how is Newton’s problem solved? How do 
we construct the curve of given length and width which, when revolved about 
the y-axis yields the surface of the body subject to least resistance in a rare 
medium? 

The equations in (11) show that all presumed minimal curves depend on 
one parameter and are similar to each other. This means that to solve New- 
ton’s problem we can proceed as in the case of the brachistochrone. 

We draw the curve for, say, c = 2. (See Figure 8.7.) It can be shown that 
f(x, 2) intersects any line y= kx, k > 0, just once. 

We draw the line y = (H/R)x joining the origin to the point (R, H). Let 
(X,Y) be the point of intersection of this line with the graph of the function 
y = f(x,2). Set ¢ = H/y. Then y = f(x, C) is the curve that passes 
through the required point (R, H). Of all the curves in the family (11), 
this curve is the only one having this property. As such, it is the solution of 
Newton’s problem. 

2. What is the meaning of Newton’s mysterious phrase quoted in the 
beginning of this story? What connection is there between this phrase and 
our solution of Newton’s problem? 

Consider Figure 8.1 (which contains some of the required problems). We 
set |MN| =x, |MB|=y, and |BG| = b and denote by 9 the angle between 
the segment [/ N] and the tangent to the curve at NV. @ is equal to the 
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angle HGR. Also, tang = f(x). This implies that 
|BR|/|BG| = tang = |BR| = bf (x). 


Hence 
IGR|? = |BG|? + |BR|? = B°(1 + f'(x)’). 


Now Newton’s proportion 
|[MN|:|GR| =|GRI° : (4|BR| - |GB|’) 
yields 
x _PUs+yreyry? ose) 8 


by/1 + (f(x)? 4b f'(x)b” (+ (f(x)? 


This is the differential equation for Newton’s curve. (See equations (10) 
and (12).) 

Newton also anticipated the bluntness of the solid of revolution and the 
break at G (where, we recall, the angle is 135°). Also, while answering the 
first question, we showed that, given the length and width, the differential 
equation (12) and the condition at the break determine the required curve 
uniquely. We can therefore say that Newton solved the aerodynamical problem 
completely. 

3. Why was Newton’s curve relegated to the role of the unfortunate Cin- 
derella for 300 years? Why was Newton’s idea not fully understood for so 
long? 

I said earlier that the brachistochrone problem opened a new era—the era 
of the calculus of variations. This subject experienced intense growth for 
almost two centuries. As I mentioned earlier, it is only recently that we have 
realized that many technical, and for the most part cosmological, problems 
of current interest cannot be treated with the methods of the calculus of 
variations. Instead, the need for a new step forward became apparent. The 
new theory—which incorporates the calculus of variations—is known as the 
theory of optimization, or optimal control. This theory has made it possible 
to solve problems of the new type. Newton’s problem belongs to the category 
of problems of optimal control. Within the framework of this theory Newton’s 
problem has a natural and standard solution. On the other hand, this problem 
has no natural and standard solution within the framework of the calculus 
of variations. Thus this problem put Newton 300 years ahead of his time! 

4. What is a rare medium? Does it exist? Is it not absurd that a solid 
of revolution subject to least resistance should be flat in the front? Who has 
ever dreamed of a torpedo or rocket with a flat head? 

Indeed, neither water, nor the surrounding air, nor the usual liquid or 
gaseous media exhibits the properties of Newton’s rare medium. This means 
that Newton’s solution is useless for the construction of motor boats 


(12) - 
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launches, or ocean liners. But in the mid-fifties, when the era of supersonic 
and high-altitude flying machines began, Newton’s physical assumptions and 
his aerodynamical problem became scientific frontline news. “Up there” the 
medium is “rare.” Newton’s remark about blunted cones turned out to be “of 
use in the building of ships” of the supersonic and high-altitude variety. 

5. Do geniuses err? In a paper in the journal Quantum (1982, 5, pp. 11- 
18), I posed this question without supplying an answer. But when Andrei 
Kolmogorov happened to be in the editorial office and someone read him the 
article, he demanded that the question be answered in the affirmative. Well, 
maybe. 

Newton’s problem is certainly a remarkable mathematical event. For 250 
years it seemed likely that it had no physical basis and its solution was absurd. 
But the “mistake” of a genius turned out to have been an insight. 

In a word, hasty judgments are sometimes just that. /t can happen that the 
thought of a genius, which we regard as a mistake carries within it the imprint 
of truth—a truth clear to him but hidden as yet from us. 


PART TWO 


Methods of Solution 
of Extremal Problems 


We must make it our goal to find a method 
of solution of all problems...by means of a 
single simple method. 


D’Alembert 


The Ninth Story 


9 


What is a Function? 


This general concept requires that by a function of x 
one should mean the number given for every x and 
one that varies gradually together with x. The value 
of a function may be given by an analytic expression, 
or by a condition that gives the means of testing all 
numbers and of selecting one of them or, finally, the 
dependence may exist and remain unknown. 

The general form of the theory admits of the exis- 
tence of a dependence only in the sense that num- 
bers, one related to the other, should be thought of 
as given together. 


N. 1. Lobacevskii 


Before entering upon stories about methods of investigation of problems 
of maxima and minima, we will discuss functions for a while. 

The concept of a function is the key concept of mathematical analysis. 
But surprisingly, this concept was not formulated at once. At first it was 
vague and without a reasonably accurate description. The first attempts to 
outline the contours of the function concept were made at the end of the 
seventeenth century by Leibniz and Johann Bernoulli. Leibniz introduced 
the term “function.” Bernoulli associated with this term the notion of “an 
expression made up in Some way out of a variable magnitude and constants.” 
Euler later made Bernoulli’s idea more concrete, defining a function in his 
textbooks as an analytic expression made up of a variable magnitude and 
of constants. He also introduced the symbolism f(x). Euler admitted the 
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possibility of calling a function “any curve drawn freehand” and was aware 
of cases of functions—the sine function, for example—that can be described 
verbally. 

Let’s consider a simple example of a function that admits different de- 
scriptions. We have in mind the function y = |x|. |x| = Vx? = (7)? is 
undoubtedly an (analytic) expression made up of the variable magnitude x 
and of constants. As such it is a function as defined by Euler and by Johann 
Bernoulli. But it can also be represented as a “drawn curve” that follows 
the bisectors of the first and second quadrants in a rectangular coordinate 
system. And it can be described in words as the function that is equal to zero 
for x equal to zero, to x for positive x, and to —x for negative x. Quite 
generally, most functions admit different descriptions. 

Which should be the preferred description? This question gave rise to 
may disagreements. Euler thought that the class of functions that are “curves 
drawn freehand” is larger than the class of functions given by “analytic ex- 
pressions.” D’Alembert opposed this view and claimed that the two classes 
are the same. 

Daniel Bernoulli advanced what seemed like a paradoxical view of the 
concept of function: namely, that an arbitrary periodic function with period 
2x can be represented as a sum 


> (a, coskx + b, sink x). 
k=0 


The majority of mathematicians felt that this is a rather restricted class of 
functions, more restricted than “analytic expressions,” and, obviously, more 
restricted than arbitrary curves. 

At the beginning of the nineteenth century there began to crystallize the 
idea of a function as a correspondence, a law under which the independent 
variable x is transformed into y, regardless of the nature of such a corre- 
spondence. One of the first scholars to support this view was Lobaéevskii. 
His thoughts in this connection serve as the epigraph for this story. 

At about the same time similar views emerged in the French and German 
mathematical schools. Textbooks of mathematical analysis began to intro- 
duce definitions of function such as the following: 


Let x and y be the two given variables between whose val- 
ues there exists a certain dependence. In general, one of the 
variables, say x, is regarded as the independent one. The 
value of x can be chosen arbitrarily, but for a given x the 
value of y is not arbitrary. Then we say that y is a function 
of x. 

Vallée-Poussin 
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A variable magnitude y is said to be a function of the vari- 
able magnitude x if to every value of x there corresponds 
a single definite value of y. 

Nemyckii, Sludskaya, Cerkasov 


Such definitions cannot satisfy those who demand logical rigor (in general, 
the number of such people is not very great). After all, the term “function” 
in these cases is defined in terms of notions that are indeterminate and vague 
(“dependence,” “law,” “correspondence,” and so on). 

The creation of set theory brought with it a measure of calm. Its founda- 
tions were laid at the end of the nineteenth century by Georg Cantor. Now 
everything seemed to have found its proper place. In particular, one now 
defined a function as follows: let X and Y be two sets. The set F of pairs 
(x,y), x EX, y EY, is called a function if for every x € X there is 
exactly one y € Y such that (x, y) € F. Then we write y = f(x) .* 

The concepts of set theory made a tremendous impression on many mathe- 
maticians who witnessed the birth of the new theory. Hilbert, one of history’s 
greatest mathematicians, had this to say of set theory: 


I think that it is the highest manifestation of mathematical 
genius and one of the greatest achievements of man’s purely 
spiritual activities. 


Almost all contemporary mathematical works make use of the fundamental 
concepts and symbolism of set theory in one form or another. 

As time passed, critical voices were heard. Contradictions emerged and 
heated arguments arose. At the beginning of the twentieth century all of the 
leading mathematicians (Poincaré, Hilbert, Hadamard, Weyl, Brouwer, and 
others) took part in the discussion of problems connected with the crisis in the 
foundations of mathematics. For some mathematicians, the set-theoretic def- 
initions (in particular, the definition of a function) were unacceptably broad. 
These mathematicians were convinced that any functional dependence of in- 
terest from the practical point of view must necessarily be “constructive.” 
In such cases there must be a distinct rule (or, to use a current expression, 
an algorithm) such that given x one can seek the required y. The debate 
gave rise to whole schools of mathematics that rejected set theory. A special 
“constructive” mathematics began to develop, a system at once similar to and 
dissimilar from the mathematics based on set theory with which mathemati- 
cians were already familiar. 

Many scholars were greatly shaken by the crisis connected with the re- 
jection of the set-theoretic conceptions by some mathematicians. In this 


*Something close in meaning to this is contained in the last phrase of the quotation from 
Lobaéevskii that is one of the epigraphs for this story 
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connection Hilbert commented that “No one will expel us from the paradise 
created for us by Cantor.” Others heatedly debated this point of view. 

Let us now turn to our problem: What is a function? Which is the pre- 
ferred point of view? Where is the truth? A deep rethinking of these ques- 
tions would take us too far afield. What must be said, however, is that all 
sufficiently definite descriptions of the function concept will do for general 
use, that is, for the solution of real-life problems, for physics, and, more 
generally, for applications of mathematics in the natural sciences, in tech- 
nology, in economics, and so on. Throughout this book we will find that the 
kind of understanding of the function concept that began to take shape at 
the very sources of mathematical analysis will suffice for our basic purposes. 
Thus we will think of the function concept “in the manner of Bernoulli” as 
“an expression made up in some manner out of a variable magnitude and 
constants.” 

Functions differ in the number of their variables. We will first discuss 
functions of a single real variable. When referring to a function y = f(x), 
we will always have in mind a specific rule for obtaining the number y from 
the number x. For example, “take a number, square it, add one to the 
square, and take the square root.” This is the description of the function 
y=V1+ x , which, of course, is an “expression made up of a variable mag- 
nitude and constants.” (Note that, to a large extent, Bernoulli’s views agree 
with the views of modern “constructivists”; we need only replace the vague 
term “expression” by the term “algorithm,” which can be given a very pre- 
cise meaning. The meaning in question is, essentially, a precisely formulated 
prescription of what to do with the number x to obtain y.) 

Functions of a single variable can be represented by means of graphs. To 
this end we orient the x-axis, as usual, horizontally form left to right and the 
y-axis vertically upward. We denote the intersection of the axes by the letter 
O. Then we choose a unit of length for the axes. Next, having chosen (a 
value for) x we lay it off on the x-axis and then lay off a segment of length 
y = f(x) on the perpendicular through the point on the x-axis. The result is 
the graph of the function y = f(x). In particular, the graph of the function 
y=vie+ x isa hyperbola. (See Figure 9.1.) 

Now let’s define and represent some of the most important functions of 
one variable. 

The simplest of all functions is the constant function 


y=ec. 


This function associates to each number x one and the same number c. For 
c = 1, we get the constant function f(x) = 1 represented in Figure 9.2. 
Next in complexity are the /inear functions 


y = bx. 
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FIGURE 9.1 


FIGURE 9.2 


Here 5 is a constant and x is any real number. These functions are rep- 
resented by lines through the origin other than the y-axis. For example, 
suppose b = 2. The corresponding linear function is y = 2x. Here, for 
x=1, y=2, for x = 1/4, y = 1/2, and so on; for each x the corre- 
sponding y is obtained in a definite manner, namely by multiplying x by 2. 
Figure 9.2 includes a representation of “the simplest” linear function y = x. 
Its graph is the bisector of the angle formed by the coordinate axes. 

By combining constant and linear functions, we obtain functions of the 
form 

y=bxte. 


In high school, these functions are called linear. We think that it is more 
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natural to call them affine functions. Affine functions will play a very impor- 
tant role later in this book. 
We consider next the quadratic functions 
y= ax’. 

These functions are represented by parabolas passing through the origin. 
Take, for example, a = 1/2. Then for x = 1, y = 1/2, for x = 2, 
y =2, for x =10, y = SO, and so on; for each x the corresponding y is 
obtained in a definite manner, namely by multiplying x by itself and then 
multiplying the result by 1/2. Figure 9.2 on page 85 shows the function 


y=x(a= 1). 


By combining linear and quadratic functions, we obtain the quadratic tri- 
nomials y = ax’ +bx +c. 

Next come the power functions 

y = Ax". 

Here A is a real constant and 7 is a nonnegative integer. 

The exponential functions 

y=a, 

play an important role in analysis. Figure 9.2 shows the function y = 2”. 

The functions that are inverses of the power and exponential functions 
are very useful. Figure 9.2 shows the functions y = /x and y=Inx. A 
singular feature of these functions is that they are not defined for all x. Thus 
y = yx is defined only for nonnegative x, and y = Inx only for positive 
x (See Figure 9.2.) 

The functions introduced in high school include the trigonometric func- 
tions. Figure 9.2 shows such a function, namely y = sin x. 

The following is a list of functions that we will constantly deal with here- 
after: 

The power function y = Ax”; this is a constant function for n = 0, a 
linear function for » = 1, and a quadratic function for n =2, 

The “nth root of x” function y = \/x, 

The exponential function y = a’, 

The trigonometric functions: the sine (function) y = sin x, the cosine 
y = cosx, the tangent y = sinx/cosx = tanx, and the cotangent y = 
cos x/ sin x = cotx. 

These functions can be combined to form various expressions, all of which 
are functions of one variable. Some relevant examples follow: 


3 
y=Va+x’, y=xvVl--x’, ya S-5, 
2 
Ss Cae y=\b? +x? — 2bx cosa. 
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Of course, this list can be extended. We have included only the functions 
that we will encounter when solving extremal problems. An example of a 
“scarier” function is 


- v/l 
y=sin (2 ee) : 


Here I’ve encoded the following rule: take a number, add one to it, take the 
tangent of this sum, add one to the result, take the logarithm to the base 7 of 
latter sum, take the fifth root of the resulting number, raise two to the latter 
power, and take the sine of this power of two. 

Similiar, but somewhat less scary expressions (or rules, or algorithms) ex- 
haust the content of the notion of a “function of one variable,” an idea that 
will be repeatedly encountered later in this story. 

My handling of many important matters has been rather casual. I know 
that you have encountered functions of one variable in high school and I 
count on you to look in your textbook for more information, ask your teacher, 
or think some things through yourself. 

All the same, functions of one variable won’t suffice. We will find it in- 
dispensable to work with functions of two, three, four, a hundred variables 
(and—although you need not worry about it now—with functions of infinitely 
many variables). Never fear! None of this is terribly difficult. 

At first we'll take just one step forward and discuss functions of two vari- 
ables. 

Many of you would be puzzled if asked when you first began to work 
with functions of two variables. Functions of two variables are not taught 
in school, so how could you have worked with them? But, in fact, we have 
all known about functions of two variables from time immemorial. Yes, 
immemorial! None of you can remember the day when, for the first time, 
your father, mother, or a visiting acquaintance asked you a question such as: 
“Here is one apple, and here is another—how many are there in all?” This 
was in your earliest childhood, when you could hardly talk, and long before 
you learned the alphabet. You can’t remember the time when you answered 
this question. But it was at that moment, when you added one apple to 
another apple and got “two” apples, that you first encountered a function of 
two variables, the oldest and best known of such functions, namely addition: 


Z=X+Yy. 


The addition function associates to an arbitrary pair of numbers x and y 
their sum z. In particular, it associates to the pair (1, 1) the number 2, to 
the pair (7, —3) the number 4, and so on. 

You learned first to add natural numbers, then integers, then real numbers. 
Thus, you associate to an arbitrary pair (x,y) of real numbers x and y 
their sum. Put differently, the addition operation is defined for all pairs 
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FIGURE 9.3 


(x, y) of real numbers x and y. The point of all this is that you encountered 
functions of two variables far earlier than functions of one variable. 

After addition, you learned subtraction, multiplication, and division. 
These operations are all functions of two variables. Subtraction and multi- 
plication are defined for all pairs (x, y) of real numbers x and y; division 
is defined only for those pairs (x, y) for which y 40. 

Again following Bernoulli, by a function of two variables we will mean an 
expression consisting of variable magnitudes x and y and of constants. 

Functions of two variables z = f(x, y) can also be represented graph- 
ically. To this end we represent the (x, y)-plane and set the Z-axis per- 
pendicular to it. Now, given numbers x and y, we locate (x, y) in the 
(x, y)-plane, compute f(x, y), and lay off a segment of length f(x, y) on 
the line parallel to the z-axis and starting from the point (x, y). 

Let’s now define and represent some of the most important functions of 
two variables. 

Again, the simplest function is the constant function 


Z=C. 


This function associates to each pair (x, y) of real numbers x and y the 
number c. 
Next in complexity are the /inear functions 


z=ax + by. 


Here a and 5b are constants and x and y are arbitrary real numbers. To 
find z given the pair (x, y) we must multiply x by a@ and y by b and 
add the resulting numbers. Suppose a = 2, and b = 1/2. Then to the pair 
(1, 0) there corresponds the number z = 2, to the pair (1, 1) the number 
z = 5/2, and to the pair (4, —8) the number z = 0. Figure 9.3 shows the 
graph of the linear function z = x+y. The linear functions are represented 
by planes through the origin not perpendicular to the (x, y)-plane. 
Using constant and linear functions, we form functions 


z=ax+byte. 
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FIGURE 9.4 


We will also call them affine functions. They will play an extremely important 
role later in the book. 
Next come quadratic functions z = Ax? + 2Bxy+C y and functions of 
the form 
z= Ax’ +2Bxy+Cy +ax+by+e. 


Figure 9.4 shows the function z = a y . Here are a few other examples 
of functions of two variables that we will encounter when solving various 
problems: 


z=Ax*(y-Bx),  z=Axy, 


2 2 
ee 2 


3 
a b? 


z=\/(x-c)?+(y—d)’. 


The fact that enables us to give a visual representation of the character- 
istic features of functions of two variables is that the graph of a remarkable 
function of two variables is always before our eyes and “under our feet.” 

Imagine that you are standing on the ground. Your position in space 
can be described by the triple of numbers (¢, 6,4), where g and @ are, 
respectively, latitude and longitude, and + is altitude above sea level. Thus 
h = h(g, 0)—that is, A is a function of (gy, 6). (At this point, recall the 
general view of a function that we used as an epigraph for this story.) All that 
we see—hills, hollows, ravines, mountains, mirror-like surfaces of lakes— 
forms the “graph” of this function. 

The graph of this function has many noteworthy points. Here, of course, 
the peaks come first. When we scale a peak we experience the joy of victory, 
the delight of surmounting a difficulty. When travelling in the mountains we 
look for passes that take us from valley to valley. When studying the sea 
bottom we try to find the deepest point in the sea. It is these very points— 
peaks, passes, and hollows—that will properly concern us later. 

We will leave functions of two variables for now and take another step 
forward. We ask: What is a function of three variables? Of course, it is an 
expression made up of three variables x, y, z and constants. Examples? As 


z=(x-c) +(y-d), 
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many as you wish! The sum uw = x+y+4Z, the product u=x-y-z, the 
constant function u = 4d, the linear function u = ax+by+cz, the quadratic 
function u = Ax’ +2Bxy +Cy? +2Dyz+2Gxz+Fz2" ,u= Vx +y°+27, 
u=2* 4+ log, y + sin Z, and so on, and so forth. 

And what is a function of four variables? Obviously, an expression made 
up of four variables x, y, z, u and constants. 

And what is a function of 26 variables? Of course, an expression made up 
of the variables, a,b,c,d,e,f,2,h,i,j,k,1,m,n,0,p,q,1r,5,t, 
u,uv,w,x,y,z and constants. 

And a function of 28 variables? Obviously, an expression made up of the 
variables...but, we’ve run out of letters! Are we to study only functions of 
at most 26 variables? If necessary we can throw in the Greek alphabet and 
some other alphabets to boot. But all this may not be enough. In modern 
economic problems there may be thousands of variables. What must we do? 

The solution is simple. Instead of letters, the variables can be denoted 
by the single letter x with subscripts: x,,x,,...,%,. Thus a function 
of n variables can be defined as an expression made up of n variables 
X,,Xyy-.. »X,, and constants. 

The simplest function of variables is a constant function y =c. It 
associates to every choice of m numbers (x,,..., x,) the same number c. 

One extremely important class is the class of linear functions 


VHX, +...4+4,%,. 


Constants and linear functions can be combined to form the affine func- 
tions 


VHX, +...4+4,X, +0. 


We will use functions of this kind to approximate more complicated func- 
tions. 

This is perhaps the appropriate time to assign the proper names to certain 
functions. We have in mind functions of many variables. By now, you and I 
have encountered such functions many times. This happened, for example, 
in the fifth story in connection with means, where we considered functions 
such as 


Pte RSS 
yoyxi+...42?, (sel cae x, 20. 


We also encountered functions of many variables in the story of the brachis- 
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and in the story about Newton’s problem. In the latter story we encountered 
even more striking functions, namely “functions of functions”—for example, 
functions of curves and thus, in effect, functions of infinitely many variables. 
My aim in Part Two is to present a fragment of the general theory of 
extremal problems. As a first step, I have explained what functions of many 
variables are. I the next story I will explain how to pose maximum and 
minimum problems for functions of many variables subject to constraints. 


The Tenth Story 


10 


What is an Extremal Problem? 


In the first half of the book we solved a great many maximum and mini- 
mum problems. Some of them had been posed a long time ago, even centuries 
ago, and among their investigators were some of the greatest mathematicians 
of the past—Euclid, Archimedes, Kepler, Fermat, Bernoulli, Leibniz, and 
Newton. The investigations themselves took entirely dissimilar paths, and 
their aims were sometimes achieved only after long periods of meandering. 

I have chosen as an epigraph for this part the words of d’Alembert. Ponder 
them. Is it possible to find a method for solving all problems (including those 
talked about in the first part) in one simple way? 

Everybody realizes that there cannot be one simple rule for solving all 
problems in the world. (Incidentally, d’Alembert had in mind only problems 
in dynamics.) Even the possibility of the existence of a single method of 
solution of all problems discussed in the first half may strike some as doubtful. 
And yet such a method exists. We’ll present it here, in the second half of the 
book. Using this one simple method, we’ll solve all the problems from the 
first half in the same, standard, you might even say routine, way. 

But first we must have a single way of writing down the conditions of 
problems and a general language for discussing problems of such different 
content. This is what the present story is about. 

To begin, we'll once more make precise the meaning of the key words: 
“maximum,” “minimum,” “extremum,” and “optimum.” 

Recall two planimetric problems from the first part. One was discussed 
on various occasions beginning with the first story, the other appeared in the 
fifth story. 
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HERON’S PROBLEM. Given two points on the same side of a line, find a point 
on that line such that the sum of its distances to the given points is minimal. 
(Refer back to Figure 1.1.) 


KEPLER’S PLANIMETRIC PROBLEM. Jnscribe a rectangle of maximal areain 
a circle of unit radius. 
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These problems differ in that in the first we must find a minimum and in 
the second, a maximum. 

Recall that the words “maximum” and “minimum” are of Latin origin. 
They mean “largest” and “least,” respectively. In connection with maximum 
and minimum problems we frequently use two other words of Latin origin. 
One of them is “extremum,” meaning “extreme,” a term that combines the 
concepts of maximum and minimum. (Its use was suggested by the French 
mathematician du Bois-Reymond.) The other is the adjective “optimal,” 
derived from the Latin optimus, meaning “best” or “perfect.” The term “op- 
timal” has been universally adopted in recent years. 

The theory of problems concerned with finding largest and least magni- 
tudes is called the theory of extremal problems or optimization theory. If 
the problem involves finding the best influence on some processes and phe- 
nomena that man can control within some bounds, then we include it in the 
section of the theory of extremal problems called optimal control. 

The Heron and Kepler problems were stated a few lines earlier using words 
rather than formulas. The same applies to all problems in the first part: 
the terminology used was geometric in geometric problems, (the isoperimet- 
ric, Steiner, and other problems), mechanical, in mechanical problems (the 
brachistochrone and Newton problems), and algebraic in algebraic problems 
(the Tartaglia problem, problems involving inequalities, and others). No for- 
mulas appeared in these problems. This was only proper. Extremal problems 
arising in mathematics, in the natural sciences, or in practical enterprises are 
traditionally stated first without formulas, using the terminology of the do- 
main in which they arise. In order to be able to utilize a general theory, it is 
necessary to effect a translation of the statements of the problems from each 
specific language to the language of mathematics. Such a translation is called 
a formalization. 

We will use examples to illustrate the process of formalization. Let’s begin 
with Heron’s problem. 

We take the given line as the x-axis and draw the y-axis through the 
point A perpendicular to the x-axis. (Refer back to Figure 1.1.) Let the 
coordinates of the points A and B be (0, a) and (d, b), respectively. On 
the x-axis we take a point D with coordinates (x, 0). Then the sum of the 


distances from A to D and from D to B is Va*+x?+ \/b?+(d- x)’. 
This results in the following problem: Find the least value of the function 


f(x) = Va? +x? 4+ Vb? + (d- x)? 


for all values of x. 

This formalization is very natural and almost forces itself upon us. But, 
in general, a problem can have many formalizations. We will illustrate this 
by means of the planimetric problem of Kepler. 
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We orient the x, and x,-axes parallel to the sides of the rectangle. The 
equation of the unit circle is x + x = 1. Let (x,, x,) be the coordinates 
of the vertex of the rectangle in the first quadrant. (See Figure 10.1.) Then 
the area of the rectangle is 4x,x,. This yields the following formalization: 
Find the largest value of the function of two variables 


Sq(X1 5%) = 4x) x2, 
subject to the conditions 
2. 2 
A(x), %) = xp +47 -1=0, f(%,, x) = x, 20, G(x, x) = x, 20. 


Note that we can dispense with the inequalities x, >0 and x, >0. Then 
it is easy to see that the problem of finding the largest value of the function 


Sq(X1 + Xp) = 4%), 
subject to the condition 
f(x; X) = xf +35 - 1=0, 


is also a formalization of the planimetric problem of Kepler. 

If we use the equation /\(x,, x,) = x + xs — 1 =0 to express x, in 
terms of x, and substitute the result in the expression for f,, then (after 
replacing x, by x) we obtain yet another formalization: Find the largest 


value of the function g(x) = 4xV 1 —- x? subject to the condition 0< x < 1 
or the condition |x| < 1 (these conditions are dictated by the domain of 
definition of the function and by the sense of the variable x). 

We see that there are different formalizations of the same problem. The 
ease of solution of a problem often depends on its clever formalization. For- 
malization is an art. It must be learned, and the best way to learn it is to 
solve practical problems. 

We have talked about Kepler’s planimetric problem. There are different 
ways of posing similar problems in space. We discussed one formulation in 
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the first part—to inscribe a cylinder of maximal volume in a unit sphere. An- 
other formulation is to inscribe a rectangular parallelepiped of largest volume 
ina unit sphere. This formulation yields the following formalization: Find 
the largest value of the function of three variables 

Sol »X, X3) = 8X, XX, 
subject to the condition 

I, (5%. Xq 5 %) a x, ee 1=0. 

(Recall that Kepler considered the special case of this problem when x, = 
xX»). 

The following problem leads to the same formalization (without the num- 
ber 8): Find the largest value of the product of three numbers subject to the 
condition that the sum of their squares is equal to a given number. This prob- 
lem can be generalized by replacing three with five, 10, or arbitrarily many 
numbers. The latter problem is formalized as follows: Find the largest value 
of the function of variables 


So(% 2 eee Xqy) =H Apoores ate 
subject to the condition 
ft, s 1. aA te ex 10. 

We will now try to describe with a measure of definiteness all the elements 
of a correctly formalized extremal problem. It must, necessarily, involve a 
function (of 7 variables, say) for which we are to find the largest or least 
value (a function to be maximized or minimized), and a constraint given by 
a number of equalities and inequalities (with the same variables). 

What follows is a list of some minimized or maximized functions and 


functions that define constraints, chosen from among functions already en- 
countered or to be encountered. 


fi(x)= a+xr+ V b? +(d— x) Heron’s problem 


f(x) = Se Is ae problem of reflection of light 
1 2 
H as 
f(x) = pxlb — x) Euclid’s problem 
’ =4 ’ \ : 
81(%1 + %) 31% 9 Kepler’s planimetric problem 
8(X,,%)) =X, +X, -1 
hy(X,, Xy, X3) = 8x, X,x,, 
hy (X15 X_, X3) = xy + i + x3 -l. the problem of a parallelepiped 
By Xjavacs X= Xp Oe’ Xs inscribed in a sphere 
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Here f,|, f,, and f, are functions of one variable, g, and g, of two 
variables, h, and A, of three variables, and F, and F, of 7 variables. 

Thus to formalize an extremal problem is to describe precisely a function 
(denoted by f,) to be minimized or maximized and a constraint (denoted by 
C.) The constraint is usually given by equalities and inequalities. 

We will use the abbreviated notation* 


(p) fo(x) — min(max) for x in C, 


for the following formalized problem: “Find the minimum (maximum) of 
the function f(x) subject to the condition that x isin C.” The points in C 
are called admissible, if there are no constraints then (p) is called a problem 
without constraints. For example, the abbreviated description of one of the 
formalizations of Kepler’s planimetric problem is 


(Dp, ) So(X1 > Xy) = 4x,x, > max, f,(% 1%) =x +x5-1=0, 


and of Heron’s problem, simply, 


(PD, ) fy(x) = Va? +x? + Vb? + (d - x)? — min. 


(p,) is a problem with a constraint involving an equality and (p,) isa problem 
without constraints. 

An admissible point < is called an absolute minimum (maximum) of a 
problem (p) if f(x) > f(%) for every x in C (if f(x) < f(%) for every x 
in C). An absolute minimum (maximum) of a problem is called a solution 
of the problem. Our aim is to find a solution. 

To find a solution we will resort to finding so-called local extrema. 

When we get to the top of a hill or knoll in an otherwise flat locality, then 
we are at its highest point. But this doesn’t mean that we have solved the 
problem of finding the highest point above sea level. The only people who 
have “solved” the latter problem are those who have scaled Mount Everest. 
That’s the difference between absolute and local extrema. 

We give a precise definition of the latter concept. We will say that a point 


X = (%,,..., ¥,) yields a local minimum (maximum) for a problem (p) if 
there is a number é > 0 such that for all points x = (x,,..., x,) in C for 
which 


/ 2 2 2 2 
(x, — X)o +--+ (%, —%,)° <e, 


we have the inequality 


fol) = fo(®) —— (Ao(%) S$ A(*))- 


*The letter p is the first letter in the word “problem” The letter x stands for the n- 
tuple (x,,.. ,,). Hereafter we sometimes speak of “the point x.” If x = (x,,..., x,) 
. . f f f ‘ 
and @ is a number, then ax = (ax,, ,ax,), and if x =(x), . ,x,) then x+x = 
(x)+x), x, +). 
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(In other words, if the value of the function at an admissible point “near” < 
does not exceed (is not less than) /,(%) .) 

We will end this story with a formalization of the transportation problem 
that we mentioned casually in the first story. Recall that in this problem we 
must set up a shipping schedule for sending a certain product from supply 
centers to stores at minimal transportation cost. Let a, denote the number 
of units of freight at the ith supply center and m the number of centers. Let 
b , J=1,...,”, denote the number of units of freight required by the jth 
store. Finally, let c, j denote the cost of transporting a unit of freight from 
the ith supply center to the jth store. The amount of freight transported 
from the ith base is x, +---+,,, and this number must not exceed a, (no 
supply center can supply more than it has in stock). The amount of freight 
transported to the jth store is x, pe as and this number must be 
exactly equal to the requirement b, of the store. This implies the following 
formalization of the transportation problem: 


Cy Xp HC yy tH GM H+ Cn Xn Min, 


mnvvmn 
Xypte +X, Sa,, i=l1,...,m, 
Xj te +X, = 5;, im Weeerere i 
x, 20, l<i<m, l<j<a. 


We note that in the transportation problem the function to be minimized as 
well as the constraints are given by means of linear functions. The section of 
the theory of extremal problems where one studies extrema of linear functions 
subject to linear constraints is called /inear programming. It includes the 
transportation problem. 

The next story deals with extrema of functions of one variable. This topic 
is now covered in high school, but we won’t be daunted by the prospect of a 
measure of repetition. 


The Eleventh Story 
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Extrema of Functions of One Variable 


When a quantity is greatest or least, at that moment 
its flow neither increases nor decreases. 


I. Newton 


This story and the next one have the same two-part structure. In the first 
part of each story we present a solution method without proof but with some 
explanations and comments. In the second half, we give exact definitions 
and some proofs. To master the first part the reader needs to know only the 
concepts of “limit,” “continuous function,” and “derivative.” 


1. First, let’s examine a method of solution of extremal problems for func- 
tions of one variable of the following type: 


(p) f(x) — min(max), a<x<b. 


a and b in (p) can be infinite. This means that we will consider extrema of 
functions Af on a finite interval, on a ray, or on the totality of real numbers. 
EXAMPLES. 


(P;) K(x) = Yar +x? + Vb? + (d - x)? > min, 


(p) fy(x) = V1 - x? = max, l<x<. 


Recall that (p,) is a formalization of Heron’s problem and (p,) is a for- 
malization of Kepler’s planimetric problem (both formalizations were given 
in the tenth story). 
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Not all problems are solvable. For example, we have already considered 
the following unconstrained problem: 


1 
(P3) A(x) = 7 ae 


— max. 


The function f(x) < 0 and there is no point X such that f(x) =0. On 
the other hand, if x, =n, n=1,2,..., then So(X,) — 0. This means that 
(p,) has no maximum, that is, there is no point < such that f(x) < f(*) 
for all x. 

While maxima and minima need not always exist, the following theorem 
of Weierstrass guarantees the existence of solutions in a tremendous number 


of cases. 


THE THEOREM OF WEIERSTRASS. Let fo(x) be a continuous function on a 
finite interval [a, b]. Then there exist solutions of the problems 


(Prin) fo(x) — min, a<sx<b, 
and 
(Prax) f(x) max, a<x<b. 


An immediate consequence of this theorem is the existence of a solution of 
(p,). This cannot yet be claimed for (p,), where the function is considered 
on the whole line rather than on a finite interval. 

The theorem of Weierstrass implies a corollary that will allow us, among 
other things, to prove the existence of a solution for (p,). 


Corotiary. Let f, be continuous on the whole line. If lim,__.., fo(x) = 
lim, _,_,, f9(%) = 00. Then the unconstrained problem 


fo(x) — min, 
has a solution. 


We will also encounter the case in which th is continuous on aray @ < 
X < 00 or @< xX < oo. [fin the first case lim, __,, fo(x) = 00 and in the 
second case lim, __, 4(x) = lim, __,, 44(x) = 00, then the function f, attains 
its minimum on the corresponding ray. 

To find the solution of the problem (p), we'll use a method first applied 
by Fermat. Before we solve the problem, however, we recall a definition 
introduced in the previous story. Let fj be a function defined on an interval 
a<x<b andlet ¢ bea point in that interval. We say that < yields a@ local 
minimum (maximum) on (p) if there is an e€ > O such that for all x in 
[a, b] for which |x —%| <€, we have the inequality f(x) > (*) (f(x) < 
fo(X)). We sometimes say more simply that < yields a local extremum of 
the function /,. 

We have the following theorem. 
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THEOREM OF FERMAT. Let the function f, be differentiable at the point x. 
If X yields a local extremum (minimum or maximum) of fy then fy(%) =0. 


Points such that I(x) = 0 are called stationary points. Stationary points 
and endpoints are called critical points. 

The relation fy(%) = 0 is only a necessary condition for an extremum. 
Thus < = 0 is a stationary point for the function /o(x) = x? but yields 
neither a local maximum nor a local minimum. 

Fermat’s theorem gives rise to the following method for solving one-dimen- 
sional problems. We will divide the process into four stages. 


First sTtaGE. This stage involves the formalization of the problem. If pos- 
sible, the problem must be reduced to the form 


(p) fo(x) — min(max), a<x<b. 

SECOND sTAGE. This stage includes setting down the necessary condition 
f(x) = 0. 

THIRD STAGE. This stage involves finding all stationary points. 


FOURTH STAGE. This stage consists of sorting all critical values of f, and 
choosing the least (largest) of them. 


The Weierstrass and Fermat theorems imply that if a function f, satisfies 
the conditions of Weierstrass’ theorem (or its corollaries) on [a, 5] and if 
the function is differentiable at the interior points x of the interval [a, b] 
(for a < x < b), then the method outlined previously brings a solution of 
the problem. 

One fact that we will use in our solution is that if the segment [a, b] is 
finite, and the function f, is continuous on [a, 6] and differentiable at its 
interior points x,a< x < b, then the solution is found among the critical 
points (that is, at a point where the derivative is zero or at an endpoint). 

This shows that in order to apply the rule just given we must know how to 
differentiate. To facilitate this procedure we adduce a table (see Table 11.1 
on page 102) of derivatives of basic functions. 

In addition to Table 11.1, it helps to remember the following formulas: 


t t 
(7) S+eyafre', (fel =fetfe', (£) - eet 
In addition, we will frequently encounter functions of the form A(x) = 


J(g(x)). You should remember and learn to use the following formula (“the 
chain rule”) for the derivative of such composite functions: 


(8) h(x) = f'(g(x))g'(x). 
EXAMPLE. f(x) = Va’ +x?. Here h(x) = f(g(x)), where f(u) = Vu = 


yi? | g(x) = a’ +x’. Using formula (1) from the table and formulas (7) 
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TABLE 11.1. DERIVATIVES 


S (x) 
(1) x"(a#0, x >0) ax"! 
(2) a‘(a#1;a>0) a“ |Ina 
(3) log, x(a # 1) Tina 
(4) Inx i 
(5) sin x cos x 
(6) cos X —sinx 
and (8) we obtain 
1 2x x 


h'(x) = (a +x) = 


2a? + x? 2Va? + x? Va+x? 

We will conclude this section with a few words about convex functions. 
These functions play a very important role in the theory of extremal problems, 
and we'll find it necessary to recall this topic many times. Now let’s give a 
definition of a convex function of one variable. 

You may have encountered the notion of convexity already in high school. 
Recall that a figure is called convex if it contains the interval between any 
two of its points. Viewed as part of the plane, any triangle is a convex figure, 
but there are nonconvex quadrilaterals. (See Figure 11.1.) 

One can give three equivalent definitions of a convex function. One def- 
inition is that a function y = f(x) is convex if for any chord joining two 
points on its graph, the part of the graph corresponding to the intermediate 
points lies below the chord. A second definition is that f is convex if the 
set of points above the graph is convex. The third definition of convexity 
of f requires that for any numbers x, and x, and any a,0<a< 1, the 


FiGur_E 11.1 
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hes (1) 


(1) 
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following inequality (Jensen’s inequality) holds: 


flax, + (1 -a@)x,) < af(x,) + (1 - a) f(x). 


All linear functions, functions of the form y = bx +c (affine functions), 
and quadratic trinomials y = ax’ + bx +c with a > 0 are convex. Of 
the functions y = |x|? only those with p > 1 are convex. The function 


y= h? + y? is convex for all A. Not all convex functions are differentiable 
everywhere. For example, the function y = |x| is not differentiable at zero. 
But if a convex function is differentiable, then its derivative is an increasing 
function. 


2. In the first half of this story we have described a rule for the solution of 
problems. This rule is easy to remember and, using it, we can immediately 
solve problems (which is what we will be doing in the thirteenth story). We 
are sure, however, that many readers will want to know about the origin of 
this rule. We will explain this not once but twice. You might easily ask why. 

I think that readers who enjoy scientific and popular-scientific literature 
fall into two categories. One category consists of the majority of readers who 
try to understand just the fundamental ideas. They like presentations that 
are expressive, if not quite rigorous, and don’t complain if they notice that 
apparently inessential details are missing. It is this category of readers that 
I had in mind in the first half of this story and will have in mind in the first 
half of the next story. 

But a writer must not forget those readers who are not satisfied with the 
description of general ideas alone, readers who insist on getting to the essence 
of ideas, to the very heart of the matter, if possible. The concluding part of 
this section has been written with this category of readers in mind. In this 
part we will try to be as precise and brief as possible. 

Imagine driving along a rectilinear highway. (See Figure 11.2.) At every 
moment your car is at some definite distance from some initial point. This 
means that the location of the car can be given at every moment ¢ by a single 
number s(t). In this way we obtain a function of time: s(t) is the distance 
at time ¢ from the car to the initial point. 

Now look at the speedometer. It indicates velocity. We denote the velocity 
of the car at time ¢ by u(t). Fromcourses in physics and mathematics we 
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know that velocity is the time derivative of the distance function s(t): 


v(t) = = = s(t). 


(Some say that Lord Kelvin, one of the best physicists of the nineteenth 
century, claimed the opposite. He would say something like “Don’t bother 
me with your mathematics: the derivative is velocity!”) 

If at a certain moment the velocity is not zero—assume, for definiteness, 
that it is positive, as shown in Figure 11.2 on p. 103—then we will be fur- 
ther away from the initial point during the subsequent moments, just as we 
were closer to it in the preceding moments. This means that at the moment 
in question the distance function s(t) can have neither a maximum nor a 
minimum. It follows that at @ maximum or minimum point the velocity must 
be zero. But this is just what Fermat’s theorem is about. 

Now we say the same thing in a more rigorous way. Let’s begin with a 
precise definition of the derivative. We could use the definition presented 
in high school, but this would lead to difficulties in subsequent stories, when 
we talk of derivatives of functions of many variables. Thus, we will give a 
definition that is equally applicable in the finite-dimensional case and in the 
infinite-dimensional case (which we’ll take up later). 

What does it mean to say that a function f is differentiable at a given 
point Xq (or, equivalently, has a derivative at x))? If we avoid formulas, 
then we can say that this means that the function f(x) +x)—f(%9) is closely 
approximated by a linear function. If we are to be precise then this means 
the following. 

DEFINITION. A function y = f(x), defined on an interval [a, 5] that 
contains in its interior a point x)(@ < X, < 5), is said to be differentiable 
at Xp (or, equivalently, to have a derivative at x,) if there exists a linear 
function y = kx such that 


S(%q + x) — f(Xq) = kx + r(x), 


where lim, _., |r(x)|/|x| = 0 (or, as is sometimes said, r(x)/x is an infinites- 
imal). 

An immediate consequence of our definition is that 

k= lim S(% + *) = f%) 
x—0 x 
which means that the number &k in the definition is uniquely determined. 
This number is called the derivative of f at the point x, and is denoted by 
f (Xo) - 

The geometric sense of the derivative is that the line that is the graph of 
the function y = f* (X9)(% — Xq) + S(Xo) (this line passes through the point 
(Xp, f(%)) and its slope is equal to the derivative f (X)) is tangent to the 
graph of the function y = f(x). (See Figure 11.3.) 
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FiGur_E 11.3 


EXAMPLE 1. The quadratic trinomial y = ax’ +bx+c is differentiable 
everywhere and its derivative at x) is equal to 2ax,+5. We will check this 


in the case of the function f(x) = x. We have 
f(% + x) — f (Xo) = (x) +x) =x = yx tx. 


Here 2x)x is a linear function and r(x) = x’. Since lim, 9 I7(x)|/|| 
=0, f is differentiable at x, and {'(x9) = 2X. 

From the list given earlier in this story, we know that the elementary func- 
tions a*(a £0), sinx, cosx, log, x are differentiable wherever they are de- 
fined. 

Let’s look at an example of a function that is not differentiable at some 
point. 

EXAMPLE 2. The function y = |x| is not differentiable at zero. In fact, 
take any linear function y = kx. Assume for definiteness that k <0. Put 
X+|k|x, x>0 
—x+|kl|x, x<0. 

This means that lim, 9 .s9|r(x)|/|x| = 1+ |k| #0; thus, the function is 
not differentiable. The case k > 0 is similar. 


(x) = f(x) — f(0) — kx = |x| - kx = { 


FERMAT’S THEOREM. Let f,(x) be a function defined on an interval [a, 5] 
that contains in its interior a point X(a < X < b) and differentiable at x. 
If & yields a local extremum (minimum or maximum) of this function then 
f'(&) =0. 

A precise definition of a local extremum was given earlier in this story. 

Proor. We assume that f5(%) = k #0 and show that < is not a local 
extremum. We suppose that k > 0. By the definition of limit, the fact that 
lim, 9 Ir(x)|/|x| = 0 (where r(x) = fo(% + x) — fo(%) — kx) implies that 
there is d > O such that if |x| < 6 then |r(x)| < (k/2)|x|. But then for 
x >0, r(x) > -(k/2)x , so that 


So(X +X) = fol(X) + kx + r(x) > fo(X) + kx - ax = fo(X) + Ax > folx), 
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and for x < 0, r(x) < —(k/2)x, so that 

Sy(% + Xx) = fo(X) + kx 4+ r(x) S fo(X) + kx - a = fi(X) + sa < f(%). 
In other words, to the left of * the value of f, is less than f,(%) and to the 
right of < it is greater than f,(<). This means that < is neither a maximum 
nor a minimum. This completes the proof. 

The geometric sense of Fermat’s theorem is that af @ maximum or mini- 
mum point the tangent is horizontal. We also want to emphasize the “com- 
putational” sense of an extremum that Kepler talks about (see the epigraph 
to the sixth story). Consider, for example, the functions f\(x) = x and 
f(x) = x”. The first does not have an extremum at zero; the second does. If 
we increase the argument then the first function changes by the same amount 
and the second undergoes “imperceptible changes.” Specifically, if x = .01 
(which can still be represented on millimeter paper), then /,(x) = .0001, 
and this is altogether “imperceptible.” 

This is how things are with Fermat’s theorem. We postpone the exposition 
of some historical material to the fourteenth story. 

It remains to prove Weierstrass’ theorem on the existence of an extremum 
of a continuous function on a bounded interval. Since any interval can be 
transformed into the unit interval [0, 1], it is in the latter interval with which 
we will work from now on. 

To begin, we will prove the following lemma on a monotonic sequences 
of numbers. 


LEMMA. Every monotonic sequence of numbers in the unit interval has a 
limit in this interval. 


This means that if an infinite sequence of numbers {x,,...,%,,-..} is 
such that all of its elements belong to the unit interval (that is 0 < x, <1, 
n=1,2,...), and, furthermore, this sequence is monotonic (say, monoton- 
ically increasing, that is, x, < x, <---< x, <---), then there is a number 
Xp in the unit interval (0 < x) < 1) such that lim, 9 x, =%p- 

Before proving the lemma, we point out that a number in the unit interval 
is representable by an infinite decimal 0.n,n,n,--- where n, is one of the 
ten digits 0,1,2,3,4,5,6,7, 8,9. 

PROOF. Consider the first digit after the decimal point in each of the 
decimal representations of the numbers in our sequence {x,,...,%,,.--}. 
These are integers not less than 0 and not greater than 9. Since our sequence 
is monotonically increasing, these integers likewise form a monotonically 
increasing sequence. One of these integers, denote it by n, , must repeat in- 
finitely many times. Let Xn, be the first number in {x,,..., X,,...} whose 
first digit after the decimal point is n, . Then our sequence of integers cannot 
contain an integer greater than 7, ; otherwise, in view of the monotonicity 
of our sequence, n, could not reappear. 
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Next we consider the second digit after the decimal in each of the numbers 
{Xn XN gies }. These again form a monotonically increasing sequence of 
integers not less than O and not greater than 9. We again take the integer 
n, that appears infinitely many times and the first number Xn, of the new 
sequence whose second digit after the decimal sign is n,. By continuing 
in this way, we obtain an infinite decimal 0.n,n,... that represents some 
number x, in the unit interval. Beginning with Xn,» all numbers in the 
original sequence are of the form .n,.... Beginning with x, all numbers 
are of the form .n,n,... , and so on. It follows that for n=1, 2,... 


x, SX HX, -— XH <0, 
and that for n> N., 
-S 
X—-%X, £100. 
This implies that lim, __,, x, = %9, which is what we wished to prove. 
WEIERSTRASS’ THEOREM. A continuous function on a finite interval takes 
on its maximal and minimal values. 


We recall that a function y = f(x) defined on an interval [a, 6] contain- 
ing a point x) (@ < % < 5) is said to be continuous at the point x, if for 
every € >O there isa d > 0 such that |x—-x)|<6,a< x < 5, implies that 
| f(x) — f(x)| < €. An immediate consequence of this definition is that if f 


is continuous at x) and {x,,...,%,,...} is a sequence converging to X, 
(lim, 5 %, = Xo), then the sequence {f(x,),..., f(x,),...} converges to 


f(%) (lim, _... /(x,) = S(%)). A function is said to be continuous on an 
interval if it is continuous at each point of this interval. A function y = f(x) 
defined on [a, 5] is said to take on at the point x, its maximal (minimal) 
value on [a, b] if f(x9) > f(x) (f(%) < f(x)) forall x in [a, 5]. 

We can now prove Weierstrass’ theorem. We will prove it for a maximum. 

ProoF. Let the function y = f(x) be defined and continuous on the unit 
interval [0, 1]. Take two intervals A, = [a,, 5,] and A, = [a,, b,] in 
[0, 1]. We will say that A, is better than A, if there is a point X in A, 
such that f(x) > f(x) forall x in A,. 

We divide the interval A° = [0, 1] into two equal intervals A = [0, 1/2] 
and A) =[1/2, 1]. 

We choose the better of the intervals Al and Ay ; if neither is better, then 
we choose either one of the two. We denote by x, the left endpoint of the 
selected interval A’. 

We claim that, in view of our choice, for each point x not in A! there is 
a point X (that may depend on x) in A! such that f (x) > f(x). In fact, if 
A! is better, then our proof is complete. If A’ is not better and there is no 
such X, then this implies that the other interval is better, and this contradicts 
our choice. 
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Next, we divide the interval A’ into two equal intervals At and A; and 
again choose the better interval or either. We denote by x, the left endpoint 
of the selected interval A”. In view of our choice, we can again claim that 
for each point x not in A’ there is a point X¥ in A? such that f (x) > f(x) 
(think this through). 

Proceeding in this manner, we end up with a monotonic sequence 
{X,,+-+,%,.---} of elements in [0,1]. In view of our lemma, this se- 
quence converges to some limit x,. We'll prove that f(x)) > f(x) for all 
x in [0, 1]. In fact, assume that for *, f(%) > f(x). Choose 6 so small 
that |x) — x| > and that |x—-x%| <6, O<x< 1, implies f(x) < f(%). 
The lengths of the intervals A” are 2~” and their left endpoints tend to Xo: 
This means that, at some moment, the whole interval A” is in the interval 
(x9—6, X9+06). But then, on the one hand, this A" contains a point X such 
that f(X) > f(<) and, on the other hand (since |X— x,| < 6), f(X) < f(%). 
This contradiction proves our theorem. 

It is now easy to prove the corollary of our theorem (formulated in the 
early part of this story). 

Let A be a number such that for |x| > A we have f(x) > f(0). By 
Weierstrass’ theorem, there is a point x, in [—A, A] such that f(x)) < 
f(x) for all x in [—A, A]. In particular, f(x 9) < f(0). For |x| > A, 
f(x) => f(0) => f(x). Hence f(x) > f(X9) for all x, which was to be 
proved. The two remaining cases (involving rays) are just as easy to prove. 

Thus all the facts taken up in the first section of this story have been 
established. 


The Twelfth Story 


12 


Extrema of Functions 
of Many Variables. 
The Lagrange Principle 


One can state the following general principle. If one 
is looking for the maximum or minimum of some 
function of many variables subject to the condition 
that these variables are related by a constraint given 
by one or more equations, then one should add to 
the function whose extremum is sought the functions 
that yield the constraint equations each multiplied by 
undetermined multipliers and seek the maximum or 
minimum of the resulting sum as if the variables were 
independent. The resulting equations, combined with 
the constraint equations, will serve to determine all 
unknowns. 


J. Lagrange 


1. In this story we will discuss ways to solve extremum problems for func- 
tions of many variables. The essence of the matter is expressed by Lagrange 
in the words we chose as an epigraph. To understand the first part of this 
story one must be familiar with the concept of “a continuous function of 
many variables” (discussed in che tenth story) and the material in the first 


part of the eleventh story. 


Let fh, /\,---»f,, be functions of n variables x = (x,,...,%,). In 
principle, we will consider problems where the constraints are equalities as 
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well as inequalities: 
fo(x) — min(max), f(x) =0, i=1,...,m, 
f(x) <0, i=m'+1,...,m. 


(p) 


For the most part, however, we will deal with problems where the constraints 
are equalities only. Examples are 


So(x) = (x, - a)” + (x, - a,)° +(x, - b,)° +(x,- b,) + (x, - ey 
(P,) +(x, - C4) — min, xX =(x,, X). 
(p) So(x) = X,+°°X, 4 max, f(x) = xp te $2 -1=0. 


Recall that for n = 2, (p,) is the planimetric Kepler problem and for n = 3 
it is the classical Kepler problem of the parallelepiped of maximal volume 
inscribed in a sphere. As for (p,), it is the formalization of the following 
problem: Find the point in the plane such that the sum of the squares of its 
distances from three given points is a minimum (compare this with the Steiner 
problem). 

Obviously, not every problem of type (p) has a solution. However, just as 
in the case of a function of one variable, it is possible to formulate a theorem 
(also proved by Weierstrass) that guarantees the existence of a solution in 
many cases. 

Let C denote the set of admissible points in problem (p). This means 
that C consists of the points x such that 


f(x) =0, i=1,...,m', f(x) <0, i=m'4+1,...,m. 
The set C is said to be bounded if there is a constant A > O such that 
Ix| <A, i=1,...,, for all x = (x,,...,%,) in C. For example, the 


set x? So x? = 1 is bounded (in particular, the circle a + x} = 1 is 
bounded) and the set x, = co (a parabola) is unbounded. 
THE THEOREM OF WEIERSTRASS. Assume that the functions fy, .--. tn 


in problem (p) are continuous and the set C of admissible points in (p) is 
bounded. Then the problems 


f(x) — min, f(x) =0, i=1,...,m', 
(Prin) f(x) <0, f=m'4+1,...,m, 
and 

f(x) min, f(x)=0, i=1,...,m', 
(Prax } f(x) <0,  i=m'4+l,...,m 
are solvable. 


As in the previous story, we note the following corollary to this theorem. 
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Coroiary. If the function f,(x) (x = (X,,...,%,)) is continuous for 
all x and lim f,(x) = co for x? tee t x — oo, then the unconstrained 
problem: 

fo(x) > min, 
is solvable. 


We repeat the definition of a local extremum in problem (p). 

DEFINITION. A point x = (X,,..., X,) is said to yield @ local minimum 
(maximum) in problem (p) if there is an ¢ > 0 such that for all admissible 
points x = (x,,..., X,) for which 


[x,—'%,|<e85 bHOls ae hs 


the inequality f(x) > A(X) (A(x) < /(%o)) holds. 

If < yields a local extremum for the unconstrained problem (p) then we 
also say that % yields a local extremum of the function f). 

Before we can formulate the fundamental rule for the solution of problems 
of type (p), we must introduce one more concept. 

Let y = f(x) bea function defined for all x = (x,,..., X,) satisfying 
the inequalities a; < x; < b, , J=l,...,a (the set of such points is called 
the parallelepiped II (@,, 5,;... 5 @,, b,)) , and let x9 = (Xo .--- > X%,) be 
a point satisfying the strict inequalities an<x,< b, , J=l,...,n. We 
will consider the following function of one variable: 


&(X) = f(Xq » vee Ny pipe Mop tXs Xo, jens cos Xgl 


What have we done? We have fixed all but the jth coordinate of x, and 
added x to the jth coordinate. Now we assume that the function 8; is 
differentiable at zero. 

DEFINITION 2. The derivative at zero of the function 8; is called the 
jth partial derivative of the function f at the point x, and is denoted by 
Of (X)/Ox,. 

Now we formulate two theorems that will enable us to state a rule for the 
solution of the problems of type (p). We'll consider first the unconstrained 
problem (p). 


FERMAT’S THEOREM. Suppose that all partial derivatives of the function f, 
exist at the point x. If X yields alocal extremum (minimum or maximum) 


of fy then 


(1) Ut) 


Ox, 0, Je lV one8 pon 


Points at which all partial derivatives vanish are called stationary. Of 
course, just as in the case n = 1, condition (1) is a necessary condition. 

By way of illustrating the use of this theorem, we will solve the problem 
(p,). The corollary of Weierstrass’ theorem guarantees the existence of a 
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solution, which we denote by (x, y). We have 
_ OA(, 9) 


OS og, RS tt ala ea ip res, 
a V “5 rt 
ic 20) 205 = ay) + (9 = by) + 9 — ey) 9 = (ay + by +e 


Hence the answer: (X, }) is the center of gravity of the triangle (a,, a,), 
(b, ’ b,) ’ (c, ’ Cy) r. 

Now we will consider the general problem (p), except that we will leave 
out inequalities. We form the sum 


L(x, a) = Ay A(x) +A, A(X) +20 +A A(X); 


where x = (x,,...,X,) and 4 = (dg, 4,,...,4,,). We call (x, A) the 
Lagrange function and the numbers 4), 4,,...,4,, the Lagrange multipliers. 

The following is an abbreviated version of the general Lagrange principle 
in our epigraph: in order to solve problem (p) (with equalities only) one 
forms the Lagrange function and treats it as if the variables x,,..., X, were 
independent (that is, one applies Fermat’s theorem). Then one solves the 
resulting equations 
(2) 0.L(x, A) =0 


9 
Ox; 


supplemented by the constraint equations 
(3) f(x) =0, i=1,...,m, 


with respect to the variables x,,...,%,,49,4,,---.4,, and selects from 
among these solutions the required one. 
This rule is based on the following theorem. 


JSh kn, 


THEOREM (the Lagrange multiplier rule). Let fh,..., f,, be functions de- 
fined in a parallelepiped M1(a,, b,; a,, 6,3... 5 @,,5,) that contains in its 
interior a point X = (X,,...,%,) (a < %, < b,i=1,...,n). Also, 
let all functions f,;, i=0,1,...,m, and all partial derivatives 8 f,/Ax,, 
i=0,1,...,m; j=1,...,%, be continuous in this parallelepiped. If the 


admissible point X yields a local extremum (minimum or maximum) then 
there are numbers Ay, A,,... 5 Ag» not all zero, such that 


AOL(K, A) _ = 
a a JH V eesgeny 


(% = (%,,...,%,), A= (Ap, 4.400654): 


Two observations are in order. First, while the system (2)-(3) contains 
n+m equations in n+m+1 unknowns, it must be borne in mind that the 
Lagrange multipliers can be multiplied by any nonzero constant. Since we can 
always multiply so that one of the Lagrange multipliers is 1, we can say that in 
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? 


a 
Y=] 


FIGuRE 12.1 


the system (2)-(3) the number of equations is actually the same as the number 
of unknowns. A second observation is that equations (2) are most meaningful 
if A) #0. Indeed, if 4) = 0, then equations (2) simply reflect the degeneracy 
of the constraints and are not related to the function whose extremum is 
sought. Usually one imposes additional constraints to ensure that A, # 0 (in 
the theorem just stated, for m = 2, a sufficient condition is that the vectors 
(Of, (%)/Ox,,..., Of (%)/Ox,) and (0 f,(X)/(Ax,), ..., AF,(%)/(Ox,)) are 
not proportional). But one must not assume a priori that A, # 0 (this is 
what Lagrange does—see the epigraph). The following example shows that 
the Lagrange multiplier rule may fail if we make the additional assumption 
that A, #0. 
EXAMPLE. We consider the problem (Figure 12.1): 


; 2 3 
x; min, x, - x, =0. 


Figure 12.1 shows that the only solution of the problem is the point <* = 
(0,0). We try to form the Lagrange function with A, = 1 and apply the 
Lagrange algorithm: 


L =x, +4(x5-x)), 


and 
OL OL 


Bx, 7 OF 34H +1 =0, Bx, 707 2AM =O. 

The constraint equation is x} - i = 0. The resulting system is obviously 
inconsistent. 

The Lagrange multiplier rule yields the following recipe for looking for 
solutions of problem (p) with equalities. We will divide it into four stages. 

The first stage is the formalization of the problem. Here we try (if possible) 
to reduce the problem to the form (p) with m’ = m. The second stage is 
the application of the Lagrange principle, that is, setting down the system of 
equations 
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The third stage is finding all stationary points. Here it may be useful to 
first clarify the question of whether 4) can equal zero or not. 

The fourth stage is selecting from among the stationary points the points 
where the function /, takes on its least (largest) value. 

We give this solution recipe the short name of the Lagrange principle. 

The theorems just formulated imply that if (in the problem without in- 
equalities) the set of admissible points is bounded andall functions fy, .... S,, 
as well as all partial derivatives Af/Ox,, i=0,...,m, j=l,...,”, are 
continuous, then the rule just given yields a solution of the problem. 

There is no need to study the art of differentiating functions of many 
variables; after all, the matter reduces to differentiating functions of one 
variable. One could proceed directly to the solution of problems, but we 
prefer to devote more time to discussion. 


2. At this point, just as in the previous story, we will clarify some of the 
points discussed in the first section. 

In the multidimensional case there is no need to clarify Fermat’s theorem 
further, for it follows trivially from the one-dimensional case. Indeed, let the 
function Sol, ,+++,,) have a local extremum at the point (X,,..., ¥,). 
Then the function g j(*) (similar to the one defined in §1) must have a min- 
imum at 0. But then, according to the one-dimensional version of Fermat’s 
theorem discussed in detail in the previous story, 


g;(0) = 0. 
From the definition of the jth partial derivative alone we see that 
8 fo() 
g,(0) =. 


Juxtaposition of these two equalities for the different values of j yields Fer- 
mat’s theorem 
I Sol) = BIo(% aa : é n) = 0, J _ l, eee yg n. 
Ox Ox 
J J 
It remains to examine the Lagrange multiplier rule. We will begin the 
simplest situation, namely, when” = 2, m= 1, and the constraint is given 
by a linear relation. In other words, we consider the problem 


f(X,, X)) — max, A(X. Xy) = aX, + a,x, -b=0. 


The graph of y = f,(x,, x.) can be thought of as the landscape of a 
mountainous region (recall what was said in this connection in the tenth 
story). The relation f\(x,, ,) = a,x, + a,x, - 6 =O determines a line in 
the plane. Think of an electric transmission line being built in a mountainous 
region along a path whose representation on the map is a line. 
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QUESTION. Where is the highest point of the route of the transmission 
line? 

Recall how mountains are represented on a map. The map shows level 
lines, that is, curves connecting points at the same altitude. Now think of the 
mutual disposition of the route (on the map) of the transmission line and the 
level line of the mountain at the highest point of the route. 

Clearly, at the point in question the route cannot cross the level line. For 
if it did, then it would be crossing from lower to higher values of the alti- 
tude. Hence an intersection point cannot be a point of maximal height. We 
conclude that at the maximum point the route must be tangent to the level 
line. 

Now we consider the function /,(x,, x,) = a,x, + @,x,— 6. Its partial 
derivatives are 


The vector (a,, @,) is perpendicular to the route and to every level line 
of the function /\(x,, x,). This is clear from the geometric sense of the 
equation a,x, + a,x, = 5. But this turns out to be always true. Specifically, 
if f(x,,%,) is a continuous function with continuous partial derivatives, 
then the vector (0 f(X)/Ox, , Af(X)/Of,) is perpendicular to the tangent at X 
to the level line f(x) = f(X). As noted earlier, the route is tangent to the level 
line at the maximum point of the altitude. All this implies that both vectors 
(8.f4(X)/Ox, , Ofo(X)/Ax,) and (a,, a,) are perpendicular to the route and, 
therefore, proportional, that is, 


SIS eign, 2008) fay 20, 
Ox, Ox, 
or 
O0f(xX,1,A) O02(x, 1, A) 
Test 0, ar 0, 
Ox, Ox 
where 


P(x, hg, A) = Ag h(x) + AG (2). 


In this special case we have arrived at the Lagrange multiplier rule. 
Now suppose that the function /, is not necessarily affine but has contin- 
uous partial derivatives. We will again consider the families of level lines 


Ww way, Koies) =e). 


We assume that through each point of a certain part of the plane there 
passes just one curve from each family. Suppose that the problem has a 
solution at * = (%,, %,). Then all points of the curve /, given by the 
equation 


Sol)» Xo) = eo =! Sol%, « Xp), 
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must lie “on one side” of the curve /, given by the equation 

F(X 5 Xp) = C= A (KH, Xp), 
that is, these curves don’t intersect but are tangent to one another. What we 
have tried to make clear is that if the function / takes on an extremal value 
at the point < on the curve /, , then the curve /, is tangent to the curve /, . 

Recall once more that the vector (0 f(%)/Ox, , Ofy(*)/Ox,) is perpendic- 
ular to the curve /, and the vector (0 f,(%)/0x, , 9 f,(%)/8x,) is perpendic- 
ular to the curve /,. If these curves are tangent to one another, then the 
vectors in question are proportional, as is claimed in the Lagrange multiplier 
rule (with A) = 1). 

In this book we won’t be able to prove the Lagrange principle (that is, the 
Lagrange multiplier rule). However, some explanations bearing on its proof 
will be given in the fourteenth story. 

At the beginning of the present story we formulated the general problem 
involving equalities as well as inequalities. But later we dealt only with the 
case of equalities. It is natural to ask what changes in the Lagrange principle 
in the general case. Let’s turn to problem (p) posed at the beginning of this 
story and assume that all functions fo() exaahes Fin) satisfy the conditions 
of the theorem on the Lagrange multiplier rule formulated earlier. If an 
admissible point < yields a local minimum in problem (p) then there are 
numbers 4), 4,,...,4,,, not all zero, such that 


OL(%, A) _ 


Ox; 


(FH (Kp e005 Fy) A= Age Ay sy vee s Ad) 


the multipliers at the functional and at the inequalities satisfy the nonneg- 


ativeness conditions Ay 29, Any, 2 0,..-,4, 2 0, and the following 


conditions (called supplementary slackness conditions) hold: 


0, Pe iis 


A, f(%) =0, j=m'+l,...,m. 


The supplementary slackness conditions mean that a Lagrange multiplier 
A j can be different from zero only at an “active” constraint, when at an 
extremum point an inequality constraint is actually an equality: f;(x) = 0, 
j=m'+l1,...,m. 

What has changed? Firstly, “extremum” changed to “minimum.” When 
inequalities are present, the type of extremum being considered is not ir- 
relevant. Before we can apply the result just formulated in the case of a 
maximum problem or a problem involving inequalities of the form Sf >0, 
we must change the problem to one of the form (p) by possibly changing 
some of the Sj , J =0, m'+1,..., m, to =f, . We’ve added, secondly, the 
nonnegativeness conditions, and thirdly, the supplementary slackness condi- 
tions. 
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When solving problems in this book we won’t need to use the theorem just 
formulated. But its role, and the role of similar theorems, is very significant. 
In the first story we hinted that the old methods have turned out to be inad- 
equate for the solution of many economic problems. Now it is possible to be 
more concrete. 

Recall the transportation problem discussed twice before. Its formalization 
involves inequalities, and, in fact, it is difficult to formalize the problem 
without them. Earlier, one did not consider problems with inequalities. The 
just-stated addition to the Lagrange multiplier rule turned up twenty-odd 
years ago (not two hundred and fifty years ago). In addition, it became 
clear that, in the case of economic problems, the minimized functions and 
conditions are convex, even linear. This made it necessary to study convex 
functions and convex extremal problems. We, too, will now turn our attention 
to these issues. 

3. Convex functions and convex problems. We’ve already touched on convex 
functions of one variable. Convex functions of many variables are defined in 
an entirely similar manner. Thus a function f(x) = f(x,,...,%,) is said 
to be convex if for arbitrary points x and x’ and any a, 0<a< 1, the 
Jensen inequality 


f(ax +(1-a)x') <af(x)+(1-a) f(x) 
holds. 
Examples of convex functions are, first, linear and affine functions. We 
note also the distance function from a point to the origin 


S (Xp 5.005 Xp) = Ye eee exe, 


A function y = f(x) is said to be strictly convex if in the Jensen inequality, 
for x #x' and 0<a< 1, we have the strict inequality 


flax + (1 -a)x') < af(x) + (1 - a) f(x’). 


The functions y = x. y= Vie +x, and a + x are strictly convex, 
whereas the function y = |x| and the distance function are not. It is easy 
to see that if a strictly convex function y = f(x) attains its minimum at a 
point x then this minimum is unique. 

We noted that not all functions are differentiable and twice we discussed 
the example y = |x|. This function is convex and not differentiable at zero. 
The distance function is also differentiable everywhere except at the origin. 
However, if a convex function is differentiable and its derivative vanishes at 
some point, then the function attains its absolute extremum at this point. It 
is also true that the graph of a convex function always lies above any of its 
tangent planes. 

This fact is of great importance: for convex differentiable functions Fer- 
mat’s theorem is a sufficient condition for an extremum! This is one of the 
reasons why the theory of convex extremal problems is so complete. 


———The Thirteenth Story ————— 
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More Problem Solving 


Here we intend to fulfill our earlier (maybe rash) promise. Our aim is to 
solve again all problems from Part One “in the same, standard, you might 
even say routine, way,” using the same (simple) method, namely, the La- 
grange principle, or, in special cases, Fermat’s theorem. (We won’t be able to 
do this is three cases: the classical isoperimetric problem, the brachistochrone 
problem, and Newton’s problem. To solve these problems routinely, we'll 
have to invest more effort.) 

Our standard approach will involve four stages: (1) formalization; (2) 
the use of the Lagrange principle or Fermat’s theorem; (3) solution of the 
corresponding equations and location of the critical or stationary points; and 
(4) selection of the required points and discussion of the answer. 

Let’s begin with the problems that reduce to finding extrema of functions 
of one variable. 


1. Euclid’s problem on the parallelogram of maximal area inscribed in a 
triangle (fourth story) 


1° Formalization. Let’s turn again to Figure 4.1 on p. 28. As before, let 
H denote the height of the triangle ABC and b the length of AC. Let x’ 
be the length of AF’. Then 0 < x <b. Let h = h(x) denote the height 
of the triangle BD'E’. The similarity of the triangles BD'E’ and ABC 
(D'E'||AC) implies that h(x)/H = x/b. The area of the parallelogram 
AD'E'F' is equal to (H — h(x))x = H(b—x)x/b). In sum, we arrive at the 
following formalization: 


(DP, ) fox) = FEC) — max, O<x<b. 


2° Necessary condition. fy(x) = 0. 
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3° Finding the critical points. The stationary points: f((x) = 
(H(bx — x*)/b)! = (b — 2x)H/b, ie. f(x) = 0 only at the point 5/2. 
The critical points are 0, b/2, and b. 


4° Discussion. The function fo is continuous, differentiable everywhere, 
and is considered on a finite interval. This means that the solution is among 
the critical points. Sorting out the critical points: 


f(0) = f(b) =9, (b/2)>0. 


It follows that the solution of (p,) is 6/2. Answer: The required parallel- 
ogram ADEF is characterized by the fact that the point F is the midpoint 
of the segment [AC]. The same fact was established by Euclid. 


2. Archimedes’ problem on the spherical segment of largest volume among 
the isoepiphanic ones (fourth story) 


1° Formalization. Let R be the radius of the sphere and / the height 
of the spherical segment. It is well-known that the volume of a spherical 
segment is nh?(R —h/3), and its lateral surface is 2xRh. Since the area of 
the lateral surface is given, 2xRh =a, R=a/2xh. Substituting this value 
for R in the volume formula and noting that h < 2R = a/zh, we obtain 
the following formalization: 


ha xh? 


(P2) Soh) = > — =" 7 max, O<h<VJa/n. 


2° Necessary condition: fy(h) =0. 


3° Finding the critical points. The stationary points: Iy(A) - 
(ha/2 — xh°/3)' = a/2— 2h’, ie. fi(h) = 0 only if h = \/a/2x. The 
critical points: 0, a/2n, /a/z. 


4° Discussion. The function fo is continuous and differentiable every- 
where, and is considered on a finite interval. Hence the solution is among 
the critical points. Sorting out the critical points: (0) =0, fo(\/a/2z) = 
V2a°"" 6 Vx, 
fo( Va/n) = a’? 16/n. The point ,/a/2z2 yields the maximal value. This 
is the solution. Since a = 22. Rh, we obtain h = R. Answer: The required 
spherical segment is a hemisphere—its height equals the radius. The same 
result was established by Archimedes. 


3. The problem of least area (fourth story) 


1° Formalization. Let’s turn once again to Figure 4.6 on p. 33. We draw 
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a line through the point M parallel to AB and denote by JN its point of 
intersection with AC. Let E’ be a point on the ray NG, a=|AN|, x = 
|NE'|, and D’ the point of intersection of the ray AB and the line E’M. 
Since MN||AB, the triangles ME'N and AD’E’ are similar. This means 
that the ratio of their areas is the same as the ratio of the squares of the lengths 
of the segments [NE’] and [AE’]. But the area of the triangle ME’N is 
xh/2,where h is the altitude from M to AC. Hence the area of the triangle 
AD'E' is h(x + a)’ /2x , and we arrive at the following formalization: 


(a+ x)? 
x 


(Pp; ) fo(x) = 


— min, x>0. 


2° Necessary condition. f,(x) =0. 


3° Finding the stationary points. 


2\/ 2 7 2 
geo = (G2) = (£42044) =-Stl, 


that is f5(x) =0 only for x =a. 


4° Discussion. The function fo satisfies the requirements of the corollary 
to Weierstrass’ theorem in the eleventh story. This means that problem (p,) 
is solvable. Since f, is differentiable for x > 0, Fermat’s theorem implies 
that the solution must be a stationary point. But the stationary point is unique. 
This means that it is the solution. Answer: the required point E is ata 
distance of 2a from A. It follows that the point M halves the segment DE 
(in view of the similarity of the triangles ENM and EAD). We obtained 
this result before by geometric means. 


4. Heron’s problem. We’ve encountered this problem many times—in the 
first, second, tenth, and eleventh stories. It’s time now to solve it the standard 
way. 

1° Formalization. This was carried out in the tenth story: 
2 2 2 2 . 
(D4) fy(x) = Var+x'+yb+(d—-x) > min. 
2° Necessary condition. fy(x) =0. 


3° Finding the stationary points. Using the theorem on differentiating a 
function of a function, we obtain: 


/ 
(ya? + x?) = x/ a’ +x", 
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(this was carried out in detail in the tenth story) and 


Hence 


ay os ee 

Va +x? Vb? +(d-x)y 

To solve this equation we square, invert, and subtract | from both sides. This 
yields the relation (a/x) = (b/(d - x))*. After eliminating the extraneous 
root, we obtain the equality x/a = (d—x)/b (which coincides with what was 
said in the first story; see Figure 1.1). We denote the solution of the latter 
equation by x. 


4° Discussion. As a sum of two convex functions, the function f(x) is 
convex. It is also smooth. From what was said at the end of the previous 
story, we know that < is a solution of the problem. Since /5(x) is strictly 
convex, the solution < is unique. 

Let’s take another look at the very first figure in the book. The quantity 


x/Va +X? is equal to sin g,,and (d-%)/b? +(d—%) to sin ~,. From 
(1) it follows that sing, = sing,, that is 9, = 9). 

Answer: What characterizes the solution of Heron’s problem is the equality 
of the angles of incidence and reflection—a fact we established at the very 
beginning of this book. 

In the first story we stated problems | and 2, which are close to Heron’s 
problem and easily reduce to it. In problem | show that a solution exists. 
If it does not coincide with the vertex of the angle, then it is the solution of 
Heron’s problem for B and C and the side of the angle on which the point 
A should be. This means that the angles of incidence and reflection at A 
must be equal. A similar assertion holds for the point B. This leads to the 
required construction. Problem 2 is solved in the same way. Therefore there 
is no need to solve these problems formally. 


5. Snel’s problem on the law of refraction of light (third story). Let’s solve 
this problem the standard way, as Leibniz was the first to do. 


1° Formalization. We take the line separating the two media as the x-axis 
and the line through A perpendicular to it as the y-axis (see Figure 3.2 on 
page 21). Let the coordinates of A and B be A= (0,a), B=(d,-b). 
Let D’ be a point on the x-axis with coordinates (x, 0). The time it takes 


for light to traverse the path AD'B is Va +x?/v, + Vb? +(d- x)?/v,. 
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This leads to the following unconstrained problem: 


rx? ld 
(ps) jae ee aie 
1 2 


2° Necessary condition. fy(x) =0. 


3° Finding the stationary points. 


fo(x) = (“). (ees) 


Ul U2 


(1) 


bp ns Ee eh 
v, Va? +x? u,b? + (d — x)? 


(Concerning differentiation, see the previous story.) In view of the mono- 


tonicity of the functions x/u,va +x? and (d - x)/v\/ b? +(d—- x)? we 


see that the equation Iy(x) = 0 has a unique solution <. 


4° Discussion. In view of the convexity of the function f(x) Fermat’s 

theorem is a sufficient condition for an extremum. Hence < is a solution. 
The strict convexity of /5(x) implies that this solution is unique. Figure 3.2 
on p. 21 and relation (1) imply the equality 

sina, _ sina, 

en a 
that expresses Snel’s law. Answer: the solution of Snel’s problem is charac- 
terized by the equality of the ratio of the sines of the angles of incidence and 
refraction and the ratio of the velocities in the first and second media—a fact 
we established in the third story. 


6. Kepler’s planimetric problem on a rectangle of maximal area inscribed 
in a circle. We discussed this problem in the fifth and tenth stories; it was 
formalized in the tenth story. 

1° Formalization. 


(De) fo(x) =x 1 —x* — max, O<x<l. 


2° Necessary condition. fy(x) =0. 


3° Finding the critical points. The stationary points: f,(x) = (xV1—- x’)! 
=V1- x 4x(V1- 7) = VI P/V x? = 06 2? = 1 Sx 


V2/2. Thus there are three critical points: 0,1, and /2/2. 
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4° Discussion. The function f, is continuous on [0, |] and differentiable 
in (0, 1). This means that the solution is among the critical points. Sorting 
out the critical points: f(0) = f(1) =0, fh(v2/2) = V2/4. Hence 2/2 
is the solution to (p,). In this case, as implied by the formalization, the 
rectangle is square. Answer: The largest rectangle inscribed in a circle is a 
square. 


7. Kepler’s problem on an inscribed cylinder. We talked about this problem 
in the sixth story. 


1° Formalization. Let R be the radius of the sphere. Let x be half the 
height of the cylinder. Then 0 < x < R. The base radius of the cylinder is 
VR? — x? and its volume is 2r°h = 2n(R?—x?)x. Hence the formalization: 


(p,) So(x) = 2n(R?—x?)x max, O0<x<R. 


(Actually, this problem was formalized in the sixth story.) 
2° Necessary condition. fy(x) =0. 


3° Finding the critical points. The stationary points are 
fo (x) = (22(R? - x*)x)! = 2n(R°x - Pat 
= 2n(R — 3x’) =0 x, = R/V3, x, =—-R/V3. 
The second root is unsuitable (x, < 0). Hence there are three critical points: 


0, R/V3, and R. 


4° Discussion. The function jf, is continuous and differentiable every- 
where. This means that the solution is among the critical points. Since 
fo(0) = A(R) = 0, the solution is R/V3. Hence the radius of the maximal 


cylinder is / R? — R /3 = R\/2/3. Answer: The ratio of the height of the 
extremal cylinder to the base diameter is 2. This is the fact established by 
Kepler. 

We will now solve some algebraic problems. 


8. Tartaglia’s problem (fifth story). 


1° Formalization. Let x be the smaller number. Then 0 < x < 4 and 
the larger number is 8 — x. Their difference is 8 — 2x. In sum, 


(Ds ) fo(x) = x(8 — x)(8 — 2x) > max, O<x <4. 


2° Necessary condition. fy(x) =0. 
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3° Finding the critical points. The stationary points are 
Sa(x) = (x(8 — x)(8 — 2x))' = (2x? — 24x? + 64x)’ 
= 6x" — 48x +64=0ex, =4-4/V3, x, =444/v3. 


The second root is unsuitable (x, > 4). Thus there are three critical points: 
0, 4, and 4—- 4/V3. 


4° Discussion. The function fo is continuous and differentiable every- 
where, and is considered on a finite interval. This means that the solution is 
among the critical points. Since /,(0) = fo(4) =0 and f,(4-4/V3) >0, it 
follows that 4 — 4/V3 is the solution of (p,). Answer: The larger number 
is 4+4//3 and the smaller one is 4—4//3. This fact was established by 
Tartaglia. 


9. The inequality of the arithmetic-geometric means (fifth story). We con- 
sider an auxiliary extremal problem: 
So(%) =X, + %_ +++, 4 max, 
(Po) SA(x) =x, +x, 4°--4+%x, =1, 
f(x) =x,_, 20, i=2,3,...,n+1 (x =(x,,...,%,))- 
The functions f; and their partial derivatives are continuous. Since 0 < 
x, <1 forall k, the set of admissible points is bounded. Hence Weierstrass’ 
theorem implies the existence of a solution * = (%,,..., %,). Of course, 
x, #0; otherwise /5(<) =0, at a time when there exist admissible elements 
with f,(x) >0. 
Of course, <~ will also be a local maximum in problem (p,). Since, as was 
just shown, x, > 0, it will also be a local maximum in the problem without 
inequalities. 


1° Formalization. 
(p’) f(x) > max, f(x) =1. 
The Lagrange function for (p’) is &(x, Ags Ay) = Ag A(x) +4, A(X) - 


2° Necessary condition. This condition is Lagrange multiplier rule: 


OL _ 


ax, 0, a Ge 


3° Finding the stationary points. Let A denote the product ¥%, - ee 
Then 
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The assumption 4, = 0 would imply the untenable conclusion that both 
multipliers A) and A, are zero. Thus *, = —A)A/A,, that is, %, =--: = 
X,=1/n, (since % +---+%, =1). 


4° Discussion. Since the stationary point in (p’) is unique, it yields the 
solution of the problem. 

We can now prove the required inequality. Let a,,...,@, be arbitrary 
nonnegative numbers. Put S=a,+---+a, and x; =a,/S. Then x,+---+ 
x, = 1 and, by what was just proved. 

a, -a)---a, 


37 HX X_ XS 


which is the required result. 
10. The inequality of the arithmetic-quadratic means (fifth story). 


1° Formalization. 

ae So(x) =X, +-°° +X, 7 max, 
p 

i AQ) exper teal (x =(x,,...,%,)). 

The functions f, and /, and their partial derivatives are continuous. 
Since -l <x, <1, kK=1,..., 2a, the set of admissible points is bounded. 
This means that a solution exists, and we can use the Lagrange principle. The 
Lagrange function is 2(x,49,4,) =Ag/o(x) +4, (x). 


2° Necessary condition. 


3° Finding the stationary points. 
Of ,. ‘ 
ae Ao» A,) =Ay)+ 2A, x; =0. 
Jj 
The assumption that A, = 0 leads to the untenable conclusion that both 
Lagrange multipliers are zero. Hence *, = —A)/2d,, thatis x, =--- =X, = 


1/ Jn (for <7 4+---+%? =1). 


4° Discussion. Since the stationary point is unique, it is the solution of 
the problem. 

Now we can prove the required inequality. Let a,,...,a@, be arbitrary 
numbers. Put S = (a? +--+ 42)!” and x, =a,/S. Then xp... +x =1 
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and, in view of what was just proved, 
a,+::-+a, 


5 HX, +--+ 


n 


which is the required result. 
This also proves the assertion that (for nonnegative a,) 


1/2 
anes) 
“/4,°°°a < a ; 


and thus yields a solution of Kepler’s planimetric problem and of one of the 
stereometric Kepler problems discussed in the fifth story. 
Proceeding analogously, one can prove the following general inequalities 


for means. Let a,,..., a, be nonnegative numbers. Put 
a ieee 
s,- (44 =+4) ‘ > = (a) +--+ +45). 
Pp 
Then 
(1) S,<S, ifp<q, 


(2) isdo ifp>a. 
Dp q 


Earlier we proved (1) for p=1, gq =2. 


11. The Cauchy-Bunyakovskii and Holder inequalities (fifth story). Let 
a,,...,@, be fixed numbers not all zero. 


1° Formalization. We consider the extremal problem 

(P,,) Jo(*) = 4), +--+ +a,x, > max, 
2 2 2 
f(x) =x, +---+%, =B (x =(x,,..., %,)). 

The functions f, and /f, and their partial derivatives are continuous. 
Since -—B< x i< B, j=1,...,n, the set of admissible points is bounded. 
Hence a solution exists, and we can apply the Lagrange principle. The La- 
grange function is 2(x,4),4,) =Ag f(x) +4, f(x). We put 


2,1/2 
9 


A=(a+---+a 


2° Necessary condition. 
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3° Finding the stationary points. 
aL ,. . ; 
Bx ce? Ao At) = Aga; + 24%,» jJHl,...,a. 
j 


The possibility A, = 0 implies the untenable conclusion that both La- 
grange multipliers are zero. We have 


£,=Aa,/2%,=Ca,, jaly..yn. 
Since x ee es = B’, it follows that 
C’(ai +---+a2) = B’ >C =4B/A. 


4° Discussion. There are just two stationary points. The solution is the 
point corresponding to the plus sign: x; = Ba j /A. 


Now let 5,,..., 5, be any n numbers and B’ =br+---4b?. By what 
has just been proved, 
B Ba 
a,b,+---+a,b, < a, AL y-4a, qn = AB = (ai+----+a2)!? (bP 4. +52), 


which is the Cauchy-Bunyakovskii inequality. 
The Holder inequality is proved in the same way. We set down the neces- 
sary computations without comment. 


1° Formalization. 


(Pi1) fo(x) =x, +---+a,x, + max, 
f(x) = (xP +++ +b, / = B’, (a,>0,x =(x,,...,%,)). 


The Lagrange function is & = A, fo(x)+4, f(x). Weset A= (a? +- . +a?) , 


I- 


1 -l1 
p +p =\l1. 


2° Necessary condition. 


3° Finding the stationary points. 


Aa; + pA, P= sign x, = 0 > &, = Cay a C=+B/A"’”. 
4°. Solution: x, = Bai es ieee £8 Ces 
Now let b,,..., 8, be any 7 nonnegative numbers. By what was just 
| n 
proved, 
Bp-1 Bp’-1 p(1—-1/p) 
a,b, +-- + Andy S 01 orp es + nT lpn = BA ; 


which is the Holder inequality. 
Now we again turn to geometry. 
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12. The Steiner problem (fourth story). Let the coordinates of the three 
given points be A(a,, @,), B(b,, b,), and C(c,,c,). Let D bea point with 
coordinates (x,, x,). Then the sum of the distances from D to A, B, and 
C is 


So(, > %) = (x, — @,)? + (%y — ay)? + V(x, — B,)? + (x, — 8)? 


+ V(x, -e,P + (x, - eyes 


This leads to an unconstrained problem. 


1° Formalization. 
(Piz) fo(X > X2) > min . 


Note that if x x3 is large, then D is locatedfarfrom A, B and C,andso 
the sum of its distances from the points A, B, and C is also large. Hence 
x +x3 — oo implies fj(x) — oo. But then we can use the corollary to 
Weierstrass’ theorem in the previous story and conclude that problem (p,,) 
has a solution < = (X,, %,). 

It is easy to see that the partial derivatives of the function f, exist and 
are continuous at all points x other than A, B,and C. 


2° Necessary (and, in view of the convexity of f,, sufficient) condition—the 
Fermat theorem. If X #4 A,B,C, then 


BMX) _ AL(%) 


=0. 
Ox, Ox, 
3° Finding the stationary points. 

XY -a, *-b X,-c 
Bx, Hr 2) = Se ee 

|D A| |DB| |DB| 
95a (5 Ps = 2%, H-% XC 
ax, 1? = = =— 

|DA| |DB| |DC| 


4° Discussion. We clarify the geometric sense of the relations just set 
down. They state that the sum of the unit vectors 


_DA ,_ dB, _ be 
Sa oe) =, oe eS SS 
| DA| | DB | |DC | 
is zero. But then one can make out of them an equilateral triangle, that is, 
each of the angles ADB, BDC, and CDA is 120°. This means that if 


the solution does not coincide with one of the vertices of the triangle ABC, 
then D isa point from which each side is seen at an angle of 120°. Hence 
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D is just the Torricelli point talked about in the fourth story (where we also 
learned how to construct it). If the obtuse angle in a triangle is > 120° , then 
there is no point from which all sides can be seen at an angle of 120°. This 
means that * must coincide with one of the vertices, namely, the vertex of 
the obtuse angle, because the larger side lies opposite the larger angle. (Let 
C be the vertex of the obtuse angle. Then, using the natural symbols for the 
sides, c> a and c > b.This means that a+ b<a+ec and a+b< bee, 
that is, a+ is the least of the three sums.) 

Answer. If all angles in the triangle are < 120°, then the required point 
is its Torricelli point. If one of the angles is > 120°, then the required point 
coincides with the vertex of this angle. This is just the answer we derived in 
the fourth story. 

In the first story we formulated Problem 4, which is very close to Steiner’s 
problem. The answer to this problem was given in the fourth story. In the 
part pertaining to a convex quadrilateral, the answer follows immediately 
from the sufficiency of Fermat’s theorem for an unconstrained convex prob- 
lem. In the nonconvex case one proves, just as in Steiner’s problem, that a 
solution exists. Then one verifies that if the point is different from a vertex, 
then the necessary condition cannot be fulfilled. 


13. The problem of the least perimeter (fourth story). Let’s turn to Figure 
4.7. Consider lines through M parallel to AB and AC and denote by N 
and P their respective points of intersection with AC and AB. We denote 
|AN| by a, |AP| by b, the angle BAC by a, a segment through M by 
[D'E'] (D’ on AB, E' on AC), |NE'| by x, |PD'| by y, theangle AD’E’ 
by w, and the angle D’E'C by 9g. 

The similarity of the triangles PD'M and NME’' implies that x/b = 
a/y = yx =ab. By the law of cosines, 


|[D'M| = \/y? +a? — 2yacosa, |E'M| = x? +.B’ - 2xbcosa, 


It follows that the perimeter of the triangle AD'E is 


aty+\y? +a’ —2yacosa + b+x+4 Vx? +b? —2xbeosa. 


1° Formalization. In sum, we obtain the following formalization: 


f(x,y) =x+Vx° 4B - 2xbeosat+y 
(P,3) + Vy? +a — 2yacosa — min, 


f(x. y) =xy—-—ab=0, x>0. 


If we express y in terms of x and substitute the result in the minimized 
function, then we arrive at the problem 


(Pi; ) fo(x) = min, x>0, 
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where f(x) + co as x +0 and f,(x) — 00 as x — oo (check this). 

By the corollary to Weierstrass’ theorem in the eleventh story, problem 
(p,3), and therefore also (p,,), has a solution. We denote it by (%, 9). We 
will use the Lagrange principle. The Lagrange function is 


LX, VAG A) HAgSO(X VY) +A, A(X, Y)- 


2° Necessary condition. 


0L 0L 
ox 9 By 79 
3° Finding the stationary points. 
If bcosa—x 


Gy 7091 - Se ty = 0,— 
x Vx? +b — 2xbcosa 

OF pi bristle +Ax =0. 
Oy 


Vy +a’ - 2yacosa 


4° Discussion. We now explain the geometric significance of these rela- 
tions. To this end we drop perpendiculars MR and MS to AC and AB, 
respectively. Then, as is clear from Figure 4.7 on page 35, we have 


bcosa—x = |RE| er 

Vx2+b—2xbcosa |ME| j 
y—acosa |SD| 

= = cosy. 


Vy? +2" = 2ya cosa Ma) 


If we multiply the first of the relations 3° by x and the second by y and 
make use of the relations in 4° and the equality xy = ab, then we obtain 
the relation x(1 —cosg) = y(1+cosw). Applying the law of sines to the 
triangles MNE and DPM, we have 


IME|_ x IMD|_ iy 
sina sinw’ sina sing 
Hence 
PENS 008 9) a IM DIL F808) 5 areitaa® = (AD tan (45 *) 
sing sin y 2 2 


The geometric significance of the last relation is the following: The per- 
pendicular to DE at M and the bisectors of the exterior angles D and E 
intersect in one point O. In other words, the excircle to the triangle ADE 
passes through M. This is the answer we obtained in the fourth story. 


14. Apollonius’ problem. | said in the fourth story that problems on ex- 
trema are found in the works of all three of the greatest mathematicians 
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of antiquity—Euclid, Archimedes, and Apollonius. So far I have presented 
problems associated with Euclid and Archimedes. 

I couldn’t bring myself to present Apollonius’ problem. Here is why. 

The title of the greatest work of Apollonius (2627-190? B.C.) is Conica, 
or Conics. Conica is widely regarded as the apex of antique mathematics. 
Relevant to our topic is its fifth book. Here Apollonius “treats the shortest 
and the longest line segments from a point O toa conic. But he gives even 
more than he promises: he determines all the lines through O that intersect 
the conic at right angles (nowadays we call them normals), he investigates 
the positions of O for which there are two, three or four solutions... ” By 
shifting the position of O “he determines the ordinates of the limiting points 
G, and G,, at which the number of normals through O jumps from 2 to 
4, or inversely.” (See Figure 13.1.) The quoted lines are from B. L. van der 
Waerden’s Science awakening (Noordhoff, 1954, pp. 260-261). 

I did not want to formulate Apollonius’ problem in the fourth story for 
two reasons. One reason is that the topic “conic sections” is not covered in 
high school. The other is that I could not imagine how to solve the problem 
discussed by van der Waerden while remaining within the framework of “old” 
elementary mathematics. And without presenting a solution, I didn’t want 
to touch on this problem in Part One. 

When talking about the brachistochrone in the seventh story I mentioned 
that the mathematicians of antiquity primarily considered lines, circles and 
conic sections (strictly speaking, these were also the curves investigated by 
Apollonius). But the curve separating the region in which the number of 
normals is two from the region where that number is four (it is called an 
astroid) belongs to an altogether different class of curves. It turned up first in 
the seventeenth century. It is difficult to see how it could be given without the 
use of the language of algebra. (Remember this when we find the equation 
of the astroid!) 

We state the problems just mentioned as follows: 


1. How does one determine the distance from a point to a conic section? 
2. How many normals can one draw from a point to a conic? 


We will solve these problems for an ellipse rather than for all conics. 
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The equation of an ellipse in a rectangular coordinate system is (x,/a, ‘s + 
(x5 Jay) = 1. We assume that a, > a, > 0, that is, the “width” of the ellipse 
is not less than its “height.” If a, = a@,, that is if the width and height are 
equal, then the ellipse becomes a circle. We will now solve the first problem. 


1° Formalization. Let O be a point with coordinates (¢,,¢,). The dis- 
tance from a point with coordinates (¢, , ¢,) to one with coordinates (x, , x,) 
is ((x, -é) +(x, -¢, yy ? It is convenient to minimize the square of the 
distance rather than the distance itself. In sum, we obtain the problem 


Sol, > x) = (x, -é) + (x, -é) — min, 
fi%35) = CAlay +6g/ay <1 S00: 


The functions f, and /, and their partial derivatives are continuous. 
Since —a, < Xx; $a, j = 1,2, the set of admissible points is bounded. 
Hence a solution < = (X, , X,) exists, and we can use the Lagrange principle. 
The Lagrange function is 


LaAfytah- 
2° Necessary condition. 
af 
Se 70 = f= 1,2. 


3° Finding the stationary points. 


Of “ Baye 
Dy = 0 = Ag(%, — o1) +41%)/a; = 0, 


OX, 


OL 7 é 
Day 7 > Aol ta — &) +A,%,/a5 =0. 


If we suppose that 4, = 0, then A, #0 (the Lagrange multipliers cannot 
all be zero). But then our equations imply that %, = %, = 0, that is 0 = 
A(X,» %) = f,(0, 0) = —1. Hence 4) #0 and we can put A, = 1. We set 
A, =A. From our equations it follows that 


+ . 2 * 
_ oa 


=>xX. : jJ=1,2. 
1 (a +A) 


Substituting these relations in the equation of the ellipse we obtain the equa- 
tion 2.2 2.2 
Ve 141 $24) 
g(A) = —5 2 2 2 
(aj +4) (a; + A) 
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(A) 


=f -a} O r 


FIGuRE 13.2 


4° Discussion. The number of stationary points in the problem (that is, 
points corresponding to the values of 4 that satisfy the equation (A) = 1) 
does not exceed four (because we have obtained an equation of degree four; 
see also Figure 13.2). In Figure 13.2, we have the case 9(0) > 1. Since 
(0) =&i/ai +&/a; , the point (Z,, €,) lies outside the ellipse. 

It is clear that it is impossible to write down the solutions of this equation 
in some simple, explicit form. But now that we have at our disposal many 
computing tools, we can find solutions of the equation g(A) = | very quickly 
and with an arbitrary degree of precision. Once the roots A, of the equation 
have been computed, it will be necessary to find the corresponding points 
(x,(A;), x,(4;)), substitute these values in /), and find the smallest of the 
resulting numbers. 

The first of our two problems has been solved. The geometric significance 
of the relations (x, -é A + Ax, /a’ is that the vector ¢ — X, joining O and 
a minimal point of the ellipse, is proportional to the gradient of f, at x, 
that is, the vector ¢ — X lies on the normal to the ellipse. This fact was first 
established by Apollonius. 

Now we’ll tackle the second problem. We will derive the equation of the 
“dividing” curve that separates the region where one can lead two normals 
through a point from the region where one can lead four. From Figure 13.2 
it is easy to see that this division occurs for values of A for which g(A) = 1 
and g’(A) = 0, because that is when the curve y = (A) touches the line 
y =1. In other words, we must eliminate A from the relations 


pipe cies See 
(a? +4y? (ap +a) 
g (a) ei eat 7 Ea; = 
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From the second of these relations we obtain 


a,+A=A(Ea)y, a +d=-Alé,a,)”, 


where 
A= (a; —45)/{(E,a,)" (Gay). 


Substituting these relations in the equation g(A) = 1, we arrive at the equa- 
tion of the dividing curve 
(G,a,)"? + (a) = (a, - a3). 

This is the equation of the astroid discussed earlier (how could Apollonius 
have obtained it?). Outside the astroid each point has two normals, inside it, 
four (in particular, obviously, at the center of the ellipse), and on the astroid 
itself, three (except at the vertices, where there are two normals). 

Finally we have arrived at a result that was first obtained in the second 
century B.C. 

In 1975 the first all-Soviet Olympiad, “The student and scientific-technical 
progress,” was organized. The Olympiad was a gathering of about 100 of the 
best mathematics majors from the Soviet republics. One of the problems set 
before the participants was Apollonius’ problem: How many normals can one 
lead from a point to an ellipse? Just one person could cope with this problem. 
To tell the truth, the organizers thought that after twenty-two centuries more 
would have been achieved. 

In the summer of 1984, I coached high school students preparing to take 
part in the international mathematical Olympiad. The topic of the session 
was “Mathematical analysis.” To demonstrate the power of mathematical 
analysis, I decided to tell the students about topics that you have encountered 
in Part Two of this book. During the study sessions we arranged a kind 
of contest between analysis and geometry.I would suggest a problem, the 
students would solve it geometrically, and I would solve it analytically. I was 
convinced of the superiority of analysis and hoped for an easy victory. But 
matters turned out to be anything but simple. My listeners were true lovers 
of geometry and remarkably well-trained problem-solvers. It was a small 
matter for these youngsters to think of unexpected and very elegant solutions 
that were—I thought—anything but easy to find. And they regarded them as 
trivial. There was no easy victory. Nor could it be said that it was a fiasco for 
mathematical analysis. I will now present three problems from my coaching 
with the high school students in which, I think, the “theory” acquitted itself 
very well indeed. 

The solutions I present rely on analysis. Readers are urged to try and find 
“purely geometric” solutions that are obviously simpler than these. 

The first problem was given at the all-Soviet mathematical Olympiad for 
high school students in 1980. Its author is I. F. Sarygin, who is known for his 
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FIGURE 13.3 


remarkable ability to invent beautiful geometric problems (see [10R], Prob- 
lem 349). 


15. Given a unit circle. Through a given point F ona diameter AB, pass 
achord CD so that the quadrilateral ACBD has maximal area. 


1° Formalization. Let O be the center of the circle, put |OF| = a, and 
denote the angle CFB by gy. (See Figure 13.3.) Recall that the area of 
a cyclic quadrilateral is half the product of its diagonals by the sine of the 


angle between them. It is clear that |CD| = 2\/1 — a’ sin’ gy. This leads to 
the formalization 


V1 —a’ sin’ g sin g — max, O<g<nx/2. 


By making the substitution a sing = /z , we obtain the problem 
f(z)=(l-z)z> max, O<z<a’. 


Weierstrass’ theorem implies the existence of a solution. 
2° Necessary condition. Fermat’s theorem: (2) =0. 


3° Finding the critical points. There is just one stationary point: 2 = 1/2 
(if a? > 1/2). The critical points are: {0, a°} if a? < 1/2 and {0, 4, a’} 
if a’ > 1/2. 


4° Discussion. By looking at the values of f at the critical points, we 
arrive at the answer: if 0 < a < 1/V2, then 2 = a , that is, @ = 2/2; if 
1/V2<a<l,then 2=1/2, that is, @ = arcsin(1/V2). 

It was the students who gave me this problem to solve. They were con- 
vinced that their geometric solution would be incomparably simpler than the 
analytic one. But could any solution be simpler than ours? 

The next problem is also due to I. F. Sarygin. He invented it especially for 
the study session with this group. (The problem was to have been used in 
preliminary competitions, but this did not happen, so that the problem was 
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unknown to my listeners. Knowing I. F. Sarygin, who coached them, they 
looked forward to the usual easy triumph of geometry.) This is the problem 
in question (see [1OR], Problem 348). 


16. Given an angle BAC andtwo points M and N in its interior, pass 
asegment DE through M (using ruler and compass) such that the area of 
the quadrilateral ADNE is minimal. (See Figure 13.4.) 


1° Formalization. Pass segments MF and MG parallel tothe sides AC 
and AB, respectively, and denote their lengths by a and b. Drop per- 
pendiculars from N to AB and AC and denote their lengths by d and 
c. Then twice the area of ADNE is equal to (b+ x)d+(a+y)c, where 
x =|FD| and y = |GE|. Also, xy = ab; this follows directly from the 
similarity of the triangles DF M and MGE. Hence 


f(x, ¥) =(b+x)d+(at+y)e— min, 
f(x, y) =xy-—ab=0. 

The existence of a solution (%, ) follows (think about it!) from Weier- 
strass’ theorem (f, goes to infinity with x). The functions f, and f, and 
their partial derivatives are continuous, so that we can use the Lagrange prin- 
ciple. The Lagrange function is 


L=aAHt+AS- 
2° Necessary condition. 
OL 0f 
Ox =0, Oy =0. 
3° Finding the stationary points. 
Of : 
Ox =0>4A,d+A,9 =0, 
OL : 
Oy =0>4)¢+A,% =0. 


Clearly 4, #0 (otherwise x = § = 0, that is, *) = 0 4 ab), so that we 
can put A, = 1. In sum, we obtain the system of equations 


xy =ab, x/P =c/d. 
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Q isan arbitrary angle 


FiGureE 13.5 FiGureE 13.6 


4° Construction. Using the relation f/b = a/d , we construct (using ruler 
and compass) the segment /f (a glance at Figure 13.5 will remind you how 
to do this). Using the relation %? = cf + % = \/cf, we construct % (again 
by means of ruler and compass—see Figure 5.2 on p. 40). 

There are purely geometric solutions. In particular there is the very beauti- 
ful solution due to... the author! These solutions lead to other construction 
methods. Find them and compare. My students have not managed to come 
up with a simpler geometric solution! 


17. Among all pyramids with given base and height find the one with least 
lateral area. 


1° Formalization. (See Figure 13.6.) Let the base of the pyramid be a 
triangle A,A,A, with sides of length a, , a, , and a, (a,>0, j=1,2,3), 
and let O be the projection of the vertex on the base plane. Denote by 
H the height of the pyramid and by h,, A, and A, the distances from 
O to the lines containing the sides A,A,, A,A,, and A,A,, respectively 
(they are to be taken with a plus sign if O lies in the same halfplane as 
the triangle A,A,A, and with a minus sign otherwise). Then we have the 
familiar (and obvious) equality a,h,+a,h,+a,h, = 2S, where S is the base 


area. Also, the lateral area of the pyramid is ((a, \/ + he +0@,\/H’ + h; + 


a,\/H’ +h?)/2)+S. This leads to the problem 


Solty » hy, Ay) = a,\/H? +h} +.a,\/H? +h} +.a,,/H? +h? > min, 
Si(hy» hy, hy) =a,h, + ah, + a,h, - 2S =0. 

By Weierstrass’ theorem (think this through!) the problem has a solution 
(fo is increasing at infinity). The functions f) and /, and their partial 
derivatives are continuous. This means that a solution A = (h, ; h, ' h;) 
exists, and we can use Lagrange’s principle. The Lagrange function is & = 
Anfo tah - 
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2° Necessary condition. 


Of : 
on =” jJ=1,2,3. 


3° Finding the stationary points. 


a 


IL ah 
<— =034—42 4+4,a,=0, j=1,2,3. 
dh, fH +h : 


Clearly, A, #0, and we can assume that A, = 1. 


4° Discussion. Dividing the resulting equations by a jj» we immediately 


find that h, = hy = h, , that is, the projection of the altitude is the center of 
the incircle. 

After a little reflection, one of my listeners gave the right answer. I asked 
him to come up to the board and expected to hear once more a “purely 
geometric” solution accompanied by ironic comments. Quite unexpectedly, 
I saw the functions f, and /,, the Lagrange function, its partial derivatives 
and—the answer. 

Long live mathematical analysis! Don’t you agree? 

I waited impatiently for news from the twenty-fifth international mathe- 
matical Olympiad that took place in Prague in 1984. First I learned that our 
team performed with distinction; the youngsters won five first prizes and one 
second prize and collected 225 points. No team ever performed with such 
distinction in all the history of the Olympiad. Then I saw the problems from 
the competition. Of course, I was especially interested in the problems in 
which one could use the methods of investigation of extremal problems. One 
such problem appeared. 


18. Let x, y, and z be nonnegative real numbers with x +y+z=1. 
Show that 0< xy +yz4+xz—-—2xyz < 7/27. 


1° Formalization. 


So(X.¥, Z) =xXy+yzZ4+XzZ— xyz — max(min), 
f(x,y, z)=x+y+z—-1=0, x>0, y>0, z>0. 


By Weierstrass’ theorem, the problem allows a solution in the case of a max- 
imum as well as a minimum. Suppose that (x, ¥, Zz) is a solution with 
nonzero entries. Then this solution will yield a local extremum for the prob- 
lem fo(x, y¥, Zz) — max(min), f,(x, y, z) =0, and one can apply to it the 
Lagrange multiplier rule. 
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2° Necessary condition (for the Lagrange function & = Ag hy — Ai): 


‘ OL 

(i) ox TO MAY +2 —2yz)=4,, 
(ii) OF =O ee dg(x 2-242) =, 
(iii) OF = 0 dg(x +y) —2xy) =A). 


Clearly, A) #0 (otherwise A, = 0). We put A, = 1. 

3° Finding the stationary points. Subtracting the second equation from 
the first (with A, = 1), we get 
a 
2 ’ 


Similarly, y = z, or x = 1/2, and x =z, or y= 1/2. 


y-x-2z(y-x)=05>Zz= ory=xX. 


4° Discussion. If one of the numbers, say z,is 1/2, then fo(x,¥, 1/2) < 
7/27. If none of these numbers is 1/2, then there is a unique stationary point 
X=y=2=1/3 and f(1/3, 1/3, 1/3) = 7/27. Finally, if the solution has 
a zero component, say X = 0, then 0 < Ins y, z)=yz< 1/4. Answer: 
the maximum 7/27 is attained for * = ) = Z = 1/3, and the minimum, 
zero, is attained for, say, X=yp=0, Z=1. 

One more geometric problem.” It is tied to a memory. This was a long 
time ago, in fact, more than 30 years ago. The leader of the mathematical 
circle smiled enigmatically and asked: “One tetrahedron lies inside another. 
Can the sum of its edges be greater than the sum of the edges of the outer 
tetrahedron?” At first, this seemed a total impossibility. How can anything 
about the inner tetrahedron be greater? But it turns out that the sum of 
the edges can indeed be larger for the smaller tetrahedron. During the sec- 
ond round of the sixteenth all-Soviet mathematical Olympiad, tenth-grade 
students were given the following problem. 


19. The vertices of a tetrahedron KLMN lie inside, on the faces, or on the 
edges of another tetrahedron ABCD. Show that the sum of the lengths of all 
edges of the tetrahedron KLMN is less than 4/3 of the sum of the lengths 
of all edges of the tetrahedron ABCD. 

This is an interesting example. It shows that it is sometimes possible to 
solve a problem without the standard investigation, for a great deal becomes 
immediately clear after the formalization stage. 

The tetrahedron ABCD is a convex, closed, and bounded set. Denote it 


* For more on this problem see the author's paper in the journal. Kvant. 1983, 1, pp 22-25 
(in Russian) 
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by X. X can be given by means of four inequalities: 


X = {x =(%,,%,,%;)|(x.a') Sa 


_— 1? 


i=1,2,3, 4} 
where (x, y) is the scalar product of x and y. 

Our problem can be formalized as follows: 
3 x) 


4 4 
7 Ix! =x} 4 |x! —x°|4|x'-x | |x? — 37] + fx? — x4} + fx? — x 


f(x', x7, x 


max, x EX, i=1,2,3,4. 


Here x* = coe ba , xf) , k = 1,2, 3,4, are points in three-dimensional 
space, and |x — y| is the distance, in that space, between x and y. 

The function f is a continuous function of 4-3 = 12 variables. The 
tetrahedron is a closed and bounded set given by four inequalities. By Weier- 
strass’ theorem, the problem has a solution. Denote it by (x! ite Se x*) : 
In view of the strict convexity of { (that follows—think this through—from 
the properties of the distance function between a fixed point and a given 
point), it follows immediately that the points x’ must coincide with the 
vertices of X . In fact, if, for example, x' is not a vertex, then there is a 
segment [y, z] where y and z are points in X and x= (y+ z)/2. But 


then, in view of the property of a strictly convex function, 


‘ Pe tas "9 +Z .2 is 
fe! 22 ea s(t ack) 


2 


620 23: 34 a2 23 24 
<Hf(y, FFF) + f(z. 2,2, 2). 
Pn as . a2 23 24 a2 23 o4 
This implies that at one of the points (y, X ,X ,X) and (z,X°,X°, X°) 
the function takes on a value larger than f (x! , x ; x : x). This is obvi- 


ously a contradiction. 
All that remains now is a simple sorting step. We denote the perimeters of 
the tetrahedra KLMN and ABCD by Py yy and Pipcp, respectively. 
If all points K, L, M,and N are different, then the tetrahedron K LMN 
coincides with ABCD and all is clear, for 


= 4 
Prrmn = Pasco < 3 Pasco: 


Suppose that no more than two of the vertices of the tetrahedron KLMN 
coincide, say, K = L = A. Then there are two possibilities. 

1. The two other vertices also coincide, say M = N = B, in which case 
the triangle inequality implies that 


P 


KLMN = 4|AB| = 4. 3|AB| 


< $[|AB| + (|AC| + |CB|) + (|AD| + |DBI)] < 5 Pyaco- 
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2. The vertices M@ and N are different, say, “= B, N =C;; then the 
triangle inequality implies that 


|AB| < |AD| +|DB|, |AC| < |AD| +|DC]|; 
|AD| < $[|AD| + (|AC| + |CD]) + (|AB| + |BD))] 
< $Pagcp > Peru 
= 2|AB| + 2|AC| + |BC| 
< |AB| + |AD| + |DB| + |AC| + |AD| + |DC|+|BC| 
= Pagcp + |AD| < 3 Pasco: 


If three vertices coincide, say, K = L= M=A, N =B, then the inequality 
in 1 above implies that Py, ay = 3|AB| < 4|AB| < 4/3Pygcp- Thus, the 
problem has been solved. 

It is easy to see that the number 4/3 in the statement of the problem 
cannot be decreased. It is attained for the degenerate tetrahedron ABCD 
whose three vertices A, B, and C coincide. Indeed, suppose that A, B, 
and C coincide with A. Then in the tetrahedron KLMWN two vertices 
coincide with A and two with D. But then P,,-,) = 3|AD| and Prryy = 
4|AD|. 


—— The Fourteenth Story—————"| 
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What Happened Later 
in the Theory of Extremal Problems? 


We shall consider the simplest maximum and mini- 
mum problem that points to a natural transition from 
functions of a finite number of variables to magni- 
tudes that depend on an infinite number of variables. 


V. Volterra 


The methods I set forth require neither constructions 
nor geometric or mechanical considerations. They re- 
quire only algebraic operations subject to a systematic 
and uniform course. 


J. Lagrange 


An old French mathematician said: “A mathematical 
theory can be regarded as perfect only if you are pre- 
pared to present its contents to the first man in the 
street.” 


D. Hilbert 


1. On the history of mathematical analysis. The development of methods 
of solution of maximum and minimum problems is inextricably linked to the 
history of mathematical analysis. We have touched this topic many times. 
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Now we will tie together much of what was told before. 

Recall that at first maximum and minimum problems were solved individ- 
ually, each problem giving rise to a particular solution. At the beginning of 
the seventeenth century, there arose the need to find some general methods 
of investigation of extremal problems. Descartes attempted to find algebraic 
means of locating maxima and minima. Fermat was the first to employ for 
such purposes what we now call the differential calculus. According to his 
own words, he had discovered his method as early as 1629. However, the 
first relatively detailed account of the method is found in his letters to Rober- 
val (sent in 1636) and to Mersenne, who forwarded his copy to Descartes. 
Descartes received it in 1638. 

“The whole theory of evaluation of maxima and minima presupposes ... 
the following single rule,” wrote Fermat, who then went on to present the 
essence of his method as discussed in the tenth story. 

The reader would do well to read Fermat’s paper (see pp. 223-227 of 
A Source Book in Mathematics, edited by D. J. Struik, Harvard University 
Press, 1969) in order to find out how he managed to describe his method 
without using the yet-to-be-invented notion of a derivative. 

Fermat supported his theoretical arguments with an example: “To divide 
the segment AC at E so that the rectangle with sides AE and AC may 
be maximal (in terms of area).” It is easy to see that this is the very same 
problem that Euclid posed and solved geometrically in his Elements (see the 
fourth story). This example (in Fermat’s formulation) was analyzed in the 
fifth story. 

In 1671 Newton completed his Of the methods of series and fluxions with 
application to the geometry of curves [10]. This work was not published until 
1736. Here Newton laid the foundations of the differential and integral 
calculus and of the theory of infinite series. Of course, Newton also paid 
attention to finding maxima and minima. He mentions Fermat’s method 
in passing without mentioning Fermat’s name. He writes: “... seek its 
fluxion [that is, the derivative of a quantity] and set it equal to nothing” [10]. 
Newton solves two significant examples involving implicit functions, one of 
which contains a radical. Then he writes: “Using the method of solution of 
this problem one can obtain the solutions of the following problems,” and 
lists nine geometric problems that he can solve. And again the first of these 
is a problem equivalent to Euclid’s. 

In 1684 Leibniz published a work in which he also laid the foundations of 
mathematical analysis. Its very title, beginning with the words A new method 
for maxima and minima ... , shows the importance of the role of the problem 
of finding extrema in the formation of modern mathematics. In his paper 
Leibniz not only finds the necessary condition {(x) = 0, but he also uses 
the second differential to distinguish between a maximum and a minimum 
(incidentally, this was also known at the time to Newton). With the help 
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of the relation /~ (x) = 0, Leibniz solves a number of concrete problems, 
including the derivation of Snel’s law (third story). 

Leibniz’ works were significantly ahead of their time. In them one can 
already discern the thought of linear approximation of functions, of the con- 
nection between the tangent and the derivative. This idea underwent an 
interesting evolution, described in §5. 

The research of Fermat, Newton, and Leibniz promoted the emergence of 
the method of finding extrema of functions of one variable. It seems that it 
would have been natural to study next extrema of functions of two variables, 
of three variables, and so on. But this is not what happened. The history 
of analysis made a kind of zigzag and immediately embarked on the study 
of functions of infinitely many variables. It took decades for it to return to 
strictly finite-dimensional problems. 

In Newton’s problem (eighth story), in the brachistochrone problem (sev- 
enth story), and in the classical isoperimetric problem (second story), “arbi- 
trary curves” are tested. These curves cannot be given by one, two, or any 
arbitrary finite number of parameters. Their “arbitrary rule” includes “an 
infinitely large number of variables.” Small wonder we were unable to solve 
these three problems in the previous story. 

The elaboration of a theory of problems similar to these three began at the 
end of the seventeenth century. A special “calculus” of such problems was 
created, taking shape in the eighteenth and nineteenth centuries in the works 
of Euler, Lagrange, Weierstrass, and others. This theory came to be known 
as the calculus of variations. 

Analysis of functions of a finite number of variables was developed some- 
what later. And then, relatively recently, it was understood that mathemat- 
ical analysis of an infinite number of variables is not, in principle, more 
complex than finite-dimensional analysis. Once again the thought of creating 
an infinite-dimensional analysis was launched by the need to solve extremal 
problems (see Volterra’s epigraph; Volterra has in mind the classical isoperi- 
metric problem). Let’s examine this topic. 


2. What is a function of an infinite number of variables? In the ninth story 
we discussed the question: What is a function? First we looked at functions 
of one variable, where we associate to a single number x a number y in 
accordance with a definite rule. Then we examined the matter of a function of 
two variables, where we associate to a pair of numbers (x, , x,) a number y 
(again, in accordance with a definite rule). Finally we also discussed functions 
of n variables. But even before the ninth story we encountered (on a number 
of occasions) functions of infinitely many variables, where the variables were 
themselves functions. (Recall formula (2) in the seventh story and formula 
(8') in the eighth.) Mathematicians had studied such functionals (the usual 
name of functions defined on functions) for almost two hundred years before 
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they learned to handle functions of an infinite number of variables with as 
much dispatch as functions of one variable. 

Consider the set (in mathematics we also say the space) of all functions 
continuous on a segment [a, 5] of the real line. This space is denoted by 
C([a, b]). Now let these functions play the role of the variable. We’ll try 
to interpret this fact. We recall the definition of a function of one variable, 
that is, a function defined on the real line R (say the function y = y(x) = 


Vvi+ x’) . This, we remember, is a rule that enables us to obtain the number 
y for a given number x (in the concrete example we must square x, add 
one, and take the square root of the sum). 

Now we’ll try to understand the nature of a function F(y) on the space 
C([{a, b]). By the very meaning of “function” this must be a rule that enables 
us to compute the number F(y) for a given continuous function y(x) on 
[a, b]. Let’s consider some examples. We will set a = 0 and b = 1 for 
definiteness. 

EXAMPLE |. F,(y) = 2y(0). 

What has been prescribed? Given a function y(x) we must first compute 
its value at zero and then multiply this value by 2. Let’s recall some familiar 
functions and compute the values F,(y) for them. Thus if y(x) = x, then 


y(0) = O and therefore F,(y) = 0; if y(x) = V1+ x then y(0) = | and 
F\(y) = 2; if y(x) = 5-2", then y(0)=5 and F(y) =10. 

Now imagine the following game: I give you, one after another, a number 
of functions, and for each you are to compute F,(y). For example, I give 
you y(x) = 3cos(x + 2), or S5ln(x + 3), or some such. I think that in these 
cases you'll find it easy to compute a value for F,(y). To understand the 


rules of this game is to understand what is meant by the function F,. 


EXAMPLE 2. F,(y) = fo y(x) dx. (This function is just the area under the 
graph of f(x).) Again, let’s play the same game. I give you y(x) = 1 and 
you compute for me F,(y) = 1; my move is y(x) = x and your response is 
F,(y) = 1/2; my move is y(x) = sinx and your response is F,(y) = 1—cos1, 
and so on. We have become acquainted with a very important functional— 
the area functional. 

We can think of an even cleverer example. 

EXAMPLE 3. F,(y) = (2y(0))° - ar y(x)dx)?. Here the prescription is 
more involved. Given a function y(x) you must compute y(0), multiply 
this number by 2, cube the resulting number, then compute the integral of 
y(x) on [0, 1], square it, and subtract this “square” from the earlier “cube.” 
For example, suppose you are given y(x) = x. Then your response is —1/4. 
If you are given y(x) = 5-2*, then getting the right response will be a bit 
difficult. What counts, however, is that “in principle” you can carry out the 
task and obtain the number F,(y) for each given (well-defined) function 
y(x). 

In the ninth story I wrote, “Now let’s define and represent some of the 
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Most important functions ... .” Now that we face an infinite-dimensional 
space, any “representing” is difficult. But “defining” is a different matter, and 
we'll give it a try. 

The simplest function is a constant, F(y) = c. This function associates 
to every continuous function y(x) one and the same number c. 

Next in the order of complexity are linear functions. What does “linear 
function” of a function mean? It means that it associates to the sum of any 
functions a sum of numbers (that is, F(y, + ¥,) = F(y,) + F(y)) and, in 
addition, F(ay) =aF(y) for any number a and any function y(x). 

The functions in the Examples | and 2 above are linear. The function in 
the third example is not linear. Here, in the infinite-dimensional case, there 
is an abundance of linear functions. (In a sense, there are “more” linear 
functions than continuous functions. Thus, if g is a continuous function, 
then we can associate to it the linear functional 


1 
Fv) = | o(x)vaax, 


whereas the linear functional F,(y) = 2y(0) cannot be so represented.) 

Infinite-dimensional analysis studies functions of “an infinite number of 
variables,” more precisely, functionals on infinite-dimensional space (like 
the space C([a, b]). We'll give an example of another important infinite- 
dimensional space with which the calculus of variations operated, basically, 
for two centuries. This is the space C' ({a, b]) of continuously differentiable 
functions y(x), that is, functions y(x) that are continuous, together with 
their derivatives, on the segment [a, 5]. 

On the space C ({a, b]) there are defined functionals that have important 
geometric or physical meanings. Let’s look at some examples. 

EXAMPLE 4. The “length” functional: 


L(y) = [ ; V1 (y'(x))? dx. 


EXAMPLE 5. The functional of Johann Bernoulli—‘“the time of motion 
along a curve” (see formula (2) in the seventh story): 


Ty )- [ee Vi+V'oy , 


V2gy(x) 
EXAMPLE 6. Newton’s functional—resistance to motion in a rare medium 
(see formula (8') in the eighth story): 


RR xdx 
ee 2K f r+ (xy 


There is an endless supply of such examples. Rather than give more of 
them, let’s address the question of how one poses extremal questions for func- 
tionals. In the ninth story we saw that to formulate precisely an extremal 
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problem we must describe the function to be maximized or minimized as 
well as the constraints. Recall that the constraints are usually given by equal- 
ities and inequalities. In the infinite-dimensional case nothing changes. Here 
we must also describe the functional to be maximized or minimized (and 
thus also the space on which it is defined) and the constraints. 

We'll now formalize those problems from Part One that we have so far 
not solved in Part Two. In all cases we'll consider problems in the space C a 


Dido’s problem. Recall the story of Dido (see the second story). After 
an analysis of the situation of the Phoenician princess, we will submit the 
following two possibilities for stating the optimization problem. 


A) D1Do’s FIRST PROBLEM, OR THE CLASSICAL ISOPERIMETRIC PROBLEM. 70 
determine the optimal shape of a piece of land that would, for a given length 
ofits perimeter |, have maximal area. 


We considered this problem in the second story. Other formulations can be 
obtained if we make the reasonable assumption that Dido wished to secure 
access to the sea. For the sake of simplicity, let’s consider the case of a 
rectilinear shoreline and assume that Dido was shown boundaries she was 
not to cross. (See Figure 14.1) Then we obtain 


B) Dipo’s SECOND PROBLEM. Among all arcs of length | in the halfstrip 
0< x <a, y>O with prescribed endpoints (0,0) and (0, a), find an 
arc that, together with the segment y = 0, 0 < x < a, bounds a figure of 
maximal area. 


We'll limit ourselves to the formalization of the second problem.” Let y = 
y(x) be the equation of the arc. Earlier we encountered the functionals “area” 
and “length.” Bearing in mind their definitions, we arrive at the following 


* The formalization of the first problem requires consideration of functionals of a pair of 
functions. We'll descmbe such problems at the end of this story. Also, the solution of the first 
problem is easy to obtain from that of the second. 
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formulation: 


S(y)= i y(x)dx—max, L(y) = i V1 +(y'@yax =! 


with boundary conditions y(0)=0, y(a) = 

The functional to be maximized is area. The constraint is given by the 
equality L(y) = /, where L(y) is the length functional. The boundary 
conditions are also given by equalities I’; (y) = 0, I',(y) = 0, where I’(y) = 
y(0) and I’,(y) = y(a). 


The brachistochrone problem. Essentially, we formalized this problem in 
the seventh story. Recalling formula (2) from that story, we obtain the re- 
quired formalization 


T(y) = is ee (x)! 


with the boundary conditions y(0) = 0 a y(a) = 
Here the functional to be minimized is the Bernoulli functional. 


— min 


Newton’s problem. Actually, we formalized this problem in the eighth 
story. The formalization is 


R 

F(y) = 2K | eas min,  y(x)>0, 
o 1+(y (x)) 

with boundary conditions y(0) = 0 and y(R) = 

We pay special attention to the constraint y’(x) > 0—the monotonicity 
condition. We have encountered it only once before. 

Recall that we solved all these problems in different ways. But all solutions 
had one thing in common: In all of them, we approximated the curve by a 
polygonal line and in this way reduced the problem to a finite-dimensional 
one. 

Our method of solution of problems of this kind was implemented by 
Euler, whose predecessor was Leibniz. This method and its modifications are 
known as the direct methods of the calculus of variations. They are used to 
this day for the numerical solution of problems in the calculus of variations. 


3. Problems of the calculus of variations and Lagrange’s principle for 
them. We have used the term “calculus of variations” many times, and now it 
is time to make it precise. Suppose we are given some function f(x, y, Zz), 
a continuous function of three variables. We consider the functional 


t 
F(y) = [ fle, yx), y"(x)) ax. 


This functional can be considered in different spaces, but most often it 
fice Ss 4 7 1 . . 
is investigated in the space C . By the calculus of variations we mean the 
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section of the theory of extremal problems devoted to the study of maxima and 
minima of such functionals for various constraints. (Yl say more about this 
later.) 

Let us see what we must do to obtain the number F(y) fora given function 
y(x) in C '({a , b]). First we must differentiate y(x). Then we must put 
y(x) for the second argument and y’(x) forthe third. The result is a function 
of one variable that associates to anumber x the number f(x, y(x), y'(x)). 
Finally, we must integrate this function. In this way we obtain the required 
number F(y). 

Let’s return to the twelfth story for a while. In the first section of this 
story we posed the problem of minimization or maximization of a function 
of a number of variables subject to equality and inequality constraints. In 
the absence of inequality constraints the problem would take the form 


(p) Fy(x) — min(max) , F(x) =0,i=1,...,m, 


where x = (X,,...,%,) and the F(x) are functions of n variables (we 
have deliberately replaced f; by F,). 

Now let’s study exclusively problems in which the F, are not functions 
of many variables, but functionals like the F(y) introduced earlier. Such 
functionals are called functionals of the classical calculus of variations. The 
preceding sections contain a number of relevant examples, such as the length 
functional f(x,y, z)=V1+ Zz, the area functional f(x, y, z) =y, and 
the Bernoulli functional f(x, y, z)=V1+ z?/\/2gy. 

Let fo(x,y,z), f(x,y, Z),...,f,(%, ¥,zZ) be a selection of func- 
tions. Consider their corresponding functionals Fo(y), Fi(y),..., F,(y) 
from the calculus of variations and the following variational problem 


(P, ) F)(y) — min(max) , F(y)=a,,i=1,...,m. 


This problem is called the isoperimetric problem of the classical calculus 
of variations. The functions y(x) in C ‘a, b]) satisfying the conditions 
F(y)=a,, i=1,...,m, y(a@)=Yyo, and y(b) = y,, are said to be admis- 
sible in the problem (p,). 

In the absence of constraints of the type of F,(y) = a,, problem (p,) 
takes the form 


b 
(py) Fo(y) — min(max) (~ / So(x. v(x), y (x)) dx — min(oan) 


over all y(x) such that y(@) = y,, and y(b) = y,. Problem (p,) is called 
the simplest problem of the classical calculus of variations. 

The brachistochrone problem belongs to this class of simplest problems. 
Dido’s second problem is part of the class of isoperimetric problems (hence 
the term “isoperimetric” as applied to problem (p,)). 
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Newton’s problem belongs to neither of these classes, because the con- 
straint y'(x) >0 is absent from (p,) as well as from (p,). 

Now arises the question of how to define the notion of a local minimum 
(maximum) in problem (p,). To answer this question it is necessary to 
introduce some measure of “distance” between functions in C ([a , b)). We 
take as the distance between a function y(x) in C ({a, b]) and the function 
that is identically zero the number 
Ill = max |y(x)| + max |y"(x)| 
(called the norm of the function y(x)), and as the distance between functions 
y,(x) and y,(x) the number of ||y, —y,||, . In the space C([a@, b]) , we define 
the norm of a function y(x) as 


= m i 
IVIlo eae ly(x)| 


Now we can define a local minimum in a way that is entirely analogous to 
definition | in the twelfth story. 

DEFINITION. A function )(x) is said to yield a local minimum (maxi- 
mum) in problem (p,) if there is a € > O such that for all functions admis- 
sible in (p,) and satisfying the inequality 


lly —Jll<e 


we have the inequality Fo(y) > Fo(¥)(Fo(y) < Fo). 

Now we come to the key question of how to solve problem (p, ). 

Recall the meaning of the Lagrange principle as applied to problem (p). 
It consisted of two assertions. 

1. For unconstrained problems a necessary condition for an extremum at 
a point < is the equality 

F,(x) = 0, 
(Fermat’s theorem). 

2. To solve problem (p) we must form the Lagrange function and treat it 
as if the variables were independent (that is, we must apply Fermat’s theo- 
rem). We called the second assertion the Lagrange principle. 

All this turns out to have a perfect analog in the case of problem (p,). 
All we need do is modify the meaning of Fermat’s theorem. Specifically, we 
have the following theorem. 


THEOREM (Euler). Let f, in the simplest problem (p,) be a@ continuously 
differentiable function of three variables. If a function y(x) yields a local 
extremum (minimum or maximum) in the simplest problem (p,) then the 
following equation holds: 


(1) fe P(X), F(X) — Soy (BOX), F(X) = 0. 
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This equation is called Euler’s equation for the problem (p,). Its admis- 
sible solutions are called the stationary points or extremals of the problem. 

Equation (1) is a decoded version of an equation of type F,(%) = 0 as 
applied to the simplest problem. We will try to explain this in the next 
section. However, in order to solve concrete problems we need not know the 
origin of equation (1). Thus the algorithm (that goes back to Euler) for the 
solving of simplest problems consists of the following: 

Find all solutions of equation (1) (they depend on two variables) that pass 
through the given points and select the ones for which the functional F, takes 
on its least (largest) value. 

It is easy to show by direct differentiation that if /) does not depend on 
x, then equation (1) admits of the following relation (“integral”): 


(1') (W(x), ¥(x)) -y'(O)F,G0)), 9x) = constant. 


In other words, every solution of equation (1) satisfies (1’). 
The Lagrange method can be applied to the general problem (p,) without 
any modification. We must form the Lagrange function 
L =A Fy) HA Fy) +200 t Am Fim (Y) » 


which can also be written as 


b 
=f sx. yo), year, 
a 
with 
S (Xs Ys Z) =Aphl(Xsy¥, ZV+AA (X,Y, 2+ +A (Xs Ys Z)5 
and proceed as if our task was to find an extremum of the function 2 where 
the functions y are independent. In other words, we must write down the 
Euler equation 
d = d A A A A 
dx’ —f,=0e dx | otoy' t+ + AmSiny') — Aotoy + °° + AmSmy) = 9. 
All this is based on the following theorem. 


THEOREM (the Lagrange multiplier rule for isoperimetric problems). Let 
fos +++ + Sg, be continuously differentiable functions. Ifa function y(x) yields 
a local extremum (minimum or maximum) in problem (p,), then there are 
numbers Ay» +++ 4, not all zero such that Euler’s equation holds. 


Foley Xs HED, IC) #2 kg Say e(s Ds ICD) 
(2) a, (Ag Soy(% (x) ’ ¥'(x)) ae te ele AmSmy(X» ¥(x), ¥'(x))) = 0. 


The admissible solutions of equation (2) are called stationary solutions. 
The Lagrange multiplier rule justifies the following four-stage prescription 
for obtaining a solution of problem (p,). 
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1° Formalization of the problem. 2° Application of the Lagrange princi- 
ple, that is, setting down equation (2) together with the equations F\(y) = a, 
and the boundary conditions y(a) = yy, y(b) = y, . 3° Finding all station- 
ary solutions. 4° Selection of the stationary solutions that are solutions of 
the problem. 

We'll use this algorithm to solve our two problems, namely the brachis- 
tochrone problem and Dido’s problem. 

‘Solution of the brachistochrone problem. 


1° Formalization. We formalized the brachistochrone problem as one of 
the simplest type with 


f(x,y, z)=V1+27/V2ey, 


(fy not dependent on x). 


2° Necessary condition. Euler’s equation 


d 
Fx Sor' = Soy =0, 


admits the integral 


ity 


2 
I= y' hoy = constant 6 Y———— — —__¥_ 
28y y 1 +y 7 V/2ey 
(*) >Vl+y?yWe=c, 


where C is some constant. We recall that this very relation was obtained by 
Johann Bernoulli. 


= constant 


3° Finding the stationary points. This is tantamount to finding the so- 
lutions of equation (*). But we have already integrated this equation and 
found that its solutions are a family of cycloids. 


4° Discussion. It was shown in the seventh story that there is just one 
admissible cycloid in the family of cycloids that are the solutions of (*). 
This cycloid is the solution of the problem (of course, this assertion requires 
justification). 

Solution of Dido’s second problem. 


1° Formalization. We've already formalized Dido’s problem as an isoperi- 
metric problem with f,(x,y,z)=y, f(x,y, z)=VI1 ee 


2° Necessary condition. The Lagrange principle. We form the sum 
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f =4ofo+4,f, and write down Euler’s equation 


( * ) 4, z 


dx tian? 


3° Finding the stationary points. We solve this equation under the as- 
sumption that if A) =0, then y = 0 (check this bearing in mind the bound- 
ary conditions). This is possible only if / =a. Thus if / > a, then we can 
assume that A, = 1. Then (**) yields 


a Te aa ors 


y? _, ay _ +(Cx + D) 
=> 5 = (Cx+D) > qx oS 


l+y 1—(Cx + Dy’ 


SOO sd (yee 1 — (Cx + Dy) =0 
1—(Cx+D) 


— dy = 0. 


>dy=+ C 


=>(xtaylt(y+by =r’. 


This is the family of all circles. 


4° Discussion. Now it is easy to find the required solution. If a <1 < 
(xa)/2, then our family of circles contains just one circle with perimeter 
/ passing through the points (0,0) and (a,0). If / > (za)/2, then the 
solution will be the semicircle with center (a/2, (/ — (ma)/2) and radius 
a/2 “supplemented” by the segments x = 0, O< y </—(za)/2, x =a, 
O0<y</-—(xa)/2. (See Figure 14.1 on page 148.) 

Note the difference between our solutions in the thirteenth story and here. 
In the earlier story the problems were solved “to the very end.” Here there 
is an element of indeterminacy connected with the existence of a solution. 
In problems of the calculus of variations, existence of solutions is more dif- 
ficult to establish than in the finite-dimensional case. Also, in the calculus 
of variations it is often the case that solutions just don’t exist. For example, 
in the just-investigated problem of Dido, there is no solution in the usual 
sense for / > (xa)/2; indeed, the raised semicircle “supplemented by seg- 
ments” is not a function that joins the points (0,0) and (a,0). In such 
cases mathematicians speak of “generalized” solutions. 

In the case of the brachistochrone, matters are not as simple as they may at 
first appear. The difficulty is that a cycloid is not continuously differentiable. 
In other words, there is no solution of the brachistochrone problem in the 
totality of functions (in the space C (la, b])) in which the problem was 
considered. 
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The eminent twentieth-century mathematician David Hilbert (whose 
words serve as an epigraph for this story) advanced the view that every rea- 
sonable variational problem must have a solution “if, whenever necessary, 
the notion of a solution is given an extended meaning.” 

Hilbert’s idea turns out to be correct for most problems, including the 
brachistochrone problem, Dido’s problem, and the two problems formulated 
at the end of the seventh story. We will now turn to the solution of these two 
problems. 


L’Hospital’s problem 


1° Formalization. The time of propagation of light from the point (0, Yo) 
to the point (a, y) in a medium in which the velocity of propagation depends 
only on the altitude y and is equal to u(y) is given by the integral 


ryy= f VEO ay 


u(y) 
To see that this is so, it suffices to take another look at formula (2) in the 
seventh story. In sum, we end up with the following simplest problem of the 
classical calculus of variations: 


- vito"? il ui dx — min, 


y(0)=¥, — ¥(a)=y (402.9. 2)- , 


A), 


2° Necessary condition. Euler’s equation admits the following integral: 
fo ~Y fy’ =const > y\/1 +y? = D’, 


3° Finding the stationary points. 


2, .2 2 

=>(x-C)) +y =C. 
We've integrated Euler’s equation. The result is a family of semicircles with 
centers on the x-axis. 


4° Discussion. It is easy to see that for any two points (0, Yo) and (a, y,) 
there is exactly one circle from our family of circles that passes through those 
points. This circle is the solution of our problem. A proof of this fact is 
beyond the scope of this book. Jt is interesting that these very semicircles are 
straight lines in the Poincaré model of the hyperbolic plane. 
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Problem of the minimal surface of revolution 
1°. Formalization. 


[ ‘yy 1+ ('(x))? dx — min, ¥(Xp) = Yo» y(X%) = yy. 


Recall that, except for the missing factor 27, this “area-of-a-surface-of- 
revolution” functional was introduced at the end of the seventh story. Thus, 


here 
Sy(Xs¥> z) =yVl+z?. 


2° Necessary condition. Euler’s equation admits the following integral: 


2 
fy—y"' fy = const = 1+ a ae ee 


3° Finding the stationary points ( extremals). We solve the equation in 
Section 2 with the help of a substitution: 


Dy =(e' +e |)/2 > Ddy =(e'-e')dt/2, 


V D’y? —1 = Ve! +e7')2/4—- 1 = (e' -e')/2 


oe OE FSA reas Dy, 


V D’y? — 1 
y= (eri eo e (Px+D)) 


)/D. 


The curve ‘ 
y=(e*+e *)/2 


is called a catenary. 


4° Discussion. We have shown that if a solution of the problem of a 
minimal surface exists, then the curve of revolution is a catenary. 

At this point we will again discuss Hilbert’s idea of a generalized solution. 
In the brachistochrone problem and in |’Hospital’s problem, we obtained so- 
lutions. These are “genuine,” rather than generalized, solutions. True, in 
both cases, the extremals are not continuously differentiable functions when 
the ordinate of one of the endpoints is zero. (Note that in l’Hospital’s prob- 
lem, light from such a point would take an “infinite time” to propagate, so 
that there are physical reasons for ignoring points with zero ordinate.) 

In Dido’s problem we obtained a solution for / < x/2 and a generalized 
solution for / > x/2. In the minimal-surface problem the situation is more 
complex in the sense that sometimes a classical solution exists, and sometimes 
it doesn’t (see Figure 14.2, where x, =—a, x, =@, y, =Ypo). If there is no 
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(a) \ (b) \ 


FIGURE 14.2 


classical solution, then the minimum is yielded by the generalized solution 
consisting of the segments x = —a, O< y< yw; x =a,0<y<y,, 
joined by the segment y = 0, —a < x < a. (See Figure 14.2(a).) Then 
the “surface of revolution” consists of two disks connected by the “bridge” 
y=0, -a<x <a. (See Figure 14.2(b).) When a solution exists, we must 
choose the minimum from the classical and generalized solutions. 


4. From the history of the calculus of variations. In the previous section 
we explained the substance behind the term “calculus of variations.” Now 
the time has come to discuss the historical evolution of this discipline and 
the origin of the term itself. 

We recall that it all began with the brachistochrone—the problem posed 
in 1696 by Johann Bernoulli. This problem attracted universal attention, 
and soon a few similar problems were solved. (We dealt with some of them, 
namely l’Hospital’s problem and the minimal-surface problem.) While each 
problem was solved individually, it was sensed that a uniform approach is 
possible. 

Then Johann Bernoulli set his student Leonhardt Euler the task of trying to 
find a general method for solving all such problems. Euler succeeded. In 1744 
Euler published the memoir, A method for discovering curved lines having a 
maximum or minimum property or the solution of the isoperimetric problem 
taken in its widest sense. Euler’s method involved finding the equation that 
must be satisfied by a “curved line having a maximum or minimum property.” 
We set this equation down before; it came to be known as Euler’s equation. 

Note the term “isoperimetric” in the title of Euler’s work. We began this 
book with the isoperimetric problem. How does it relate to Euler? It is true 
that by means of his method Euler was also able to solve the isoperimetric 
problem. But in numerous other cases that Euler was able to settle, the con- 
straints had nothing to do with the length of the curve, with its perimeter. 
Nevertheless the term “isoperimetric” reflected the continuity of our disci- 
pline and the name stuck. 

In 1759 a highly significant event occurred. The very young Lagrange 
wrote his work bearing on this topic. He approached it from a different 
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direction, and with such success that henceforth his method (sometimes 
called the variational method) was universally adopted. Euler was delighted 
with Lagrange’s paper and refrained from publishing his own elaborations on 
this topic so as to enable the young scholar to carry his designs to completion. 
Euler called the whole new chapter of mathematics the calculus of variations. 

What is the essence of Lagrange’s method? What is a variation? 

Let’s return to Section 2 of the twelfth story where we derived the finite- 
dimensional version of Fermat’s theorem. We then reasoned as follows. 

Let f,(x) be a function of n variables x = (x,,...,%,). We assume 
that the function is differentiable and that it attains a local extremum at a 
point <. Then it is clear that the function of one variable 


RASA I) S Tih yey RR ee) 


must have a minimum at zero. In view of the (one-dimensional) theorem of 
Fermat, the following equality must hold 


a ee es ae 
ease =0. 


Lagrange applied this very method to the simplest problem of the classical 
calculus of variations: 


(1) F(y)= / "f(x, v(x), y'(x))dx > min(max), y(%9) = Yoo ¥(%,) = Vy 


We will follow Lagrange’s train of thought. Suppose that the function 
S(x,y, Zz) in (1) is continuously differentiable and the functional F(y) at- 
tains a local minimum for the continuously differentiable curve j(x). Now 
we take a “variation” of j(x). Specifically, we take any continuously dif- 
ferentiable curve y(x) that vanishes at the endpoints: y(x)) = y(x,) =0. 
Then “variation” of j(x), that is, addition to p(x) of the function y(x) 
multiplied by any number /, does not take us outside the set of admissible 
curves—indeed, all of them pass through the points (X9, yo) and (x,, y,). 
This means that the function of one variable 


(A) = g(A, y) = FU + Ay) -/ Sf (%, P(x) + A(x), F(x) + Ay’ (a) ax 


must have a local minimum at zero. But then we can again apply the one- 
dimensional Fermat theorem. This theorem implies that g’(0) = 0. Now 
we compute g’(0). 

One proves in analysis that our assumptions about the continuous differ- 
entiability of f(x,y, z) and y(x) justify differentiation under the integral 
sign. After some simple computations we find that 


(2) “0 = f *'(a(x)ylx) + bix)y'(x)) dx, 


Xo 
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where a(x) and b(x) denote L(x, H(x), ¥(x)) and 2h (x, (x), 9(x)) 
respectively. Thus, following Lagrange, we conclude that if p(x) yields a 
local maximum or minimum in the simplest problem (1), then, for any con- 
tinuously differentiable function y(x) with y(x9) = y(x,) = 0, we have the 
equality 


(3) il * (a(x)y(x) + B(x)y'(x)) dx = 0. 


We will continue our reasoning. We find a function c(x) such that c’(x) = 
a(x) and i c(x)dx = i b(x)dx = B. To this end we choose a constant 
D such that the integral of the function ¢c(x) = i a(é)d& + D over the 


interval [X), x,] is B. Integrating the first summand in (3) by parts, we find 
that 


0 ={ ‘(a(x)y(x) + b(x)y'(x)) dx = i (c'(x)y(x) + B(x)y'(x)) dx 


= / (B(x) — e(x))y'(x) ax. 


Now we take the last step. We put y(x) = J,,(6(6) — c(&))d&. Then 
it is clear that (x9) = 0. Also, (x,) = ie (b(x) — c(x))dx = 0 by the 
construction of the function c(x). Hence y (x) = b(x) — c(x); this follows 
from the Newton-Leibniz formula. But then (3) must hold for (x), that is, 
f '(b(x) ~¢(x))? dx =0. Since the integral of a continuous positive function 
cannot be zero, we conclude that b(x) = c(x). This means that 


b'(x) = a(x), 


that is, 


(4) CaN 


dx ay’ 
This is the derivation of Euler’s equation in the manner of Lagrange. 

Recall that in the previous section we said that Euler’s equation is the 
decoded version of Fermat’s theorem for the simplest problem. By now the 
meaning of this remark is clear. Euler’s equation is a consequence of the fact 
that the derivative of the functional F(y) at the point (x) in every direction 
y(X)(¥(X9) = y(%,) = 0) is zero. We note that the expression for g’ (0) has 
come to be known as the variation of the functional F . 

Lagrange did not stop with the mathematical problems of the calculus of 
variations. Strictly speaking, he concerned himself with these problems so 
that he could apply them to problems in the natural sciences. His life’s main 
work—Analytical mechanics—is a book about motions of physical bodies. 
At the basis of Lagrange’s approach to mechanics lies an extremal principle 


(x, Hx), 9") - shix, D(x), (x) =0. 
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known as the principle of least action. We will illustrate it using a very simple 
example. 

Suppose a small ball of mass m is attached to a spring of negligible weight 
that obeys Hooke’s law of the proportionality of the tension in the spring and 
its deflection from the rest position O. The spring is aligned with the y-axis. 
The displacement of the ball is given by a function y(t), where y(t) is the 
ball’s coordinate at time ¢. It is well known that y(t) satisfies Newton’s law 


(5) y(t) =—k y(t) 


which asserts that “the product of mass by acceleration equals the acting force” 
(the acceleration is given by the second derivative and the force, by Hooke’s 
law, equals —k y(t), where k is a proportionality constant). In mechanics 
T= m(y'(t))?/2 is called the kinetic energy, U = ky*(t)/2 the potential 
energy, and the integral of the difference of the kinetic and potential energies 
is called the action. 

Let’s consider the problem of minimizing the action for fixed boundary 
conditions: 


(6) 5 
[a-nae fe (4 -) dt—>min, y(t)=Y9. Y(t) =Y,- 


Euler’s equation for problem (6) yields equation (5): 


2 2 
my ky of 
aay aa aa i 
af 


Dy =-ky = Zy,— f= 00 my" +ky =0. 
Thus Newton’s second law is none other than Euler’s equation for the action. 
Put differently, the actual motions are determined by the stationary points of 
the action. For small time intervals the actual trajectory does indeed mini- 
mize the action, so that the principle of least action is true for such intervals. 
In general, it is more appropriate to speak of the principle of stationary action. 

We have again run up against the fact that the laws of nature admit dual de- 
scriptions, one “physical” and the other “extremal.” This was first mentioned 
in the third story and recalled in the seventh. There we discussed optics and 
minimized time; here we describe the motion of bodies and minimize the 
action. Following Hamilton, who investigated optical phenomena from this 
point of view, Jacobi suggested the consideration of analogs of wavefronts in 
mechanical problems and, in general, in all of the simplest problems of the 
classical calculus of variations. 

Jacobi considered the endpoint function S(x, y), whose value is that of 
the integral ie S(&, y(€)) dé on the extremal (that yields a minimum) joining 


a fixed point (xX), Yj) to the point (x, y). 
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It is obvious that any part of an extremal that yields a miminum is an 
extremal that yields a minimum. This simple observation is the decoding, 
in the general situation, of Huygens’ principle in optics (mentioned in the 
third story) and is also called Huygens’ principle. Using Huygens’ principle 
and Euler’s equation it is easy to derive the equation satisfied by the function 
S(x,y). This equation is called the Hamilton-Jacobi equation. It has been 
possible to integrate this equation in many cases of interest. This method 
affords another possibility of investigation of problems of the classical calcu- 
lus of variations. Thus the duality of the description of optical phenomena 
has led to a dual description of the solutions of an arbitrary problem of the 
classical calculus of variations. 

Now we will go back in time, back to the eighteenth century. In order to be 
able to extract consequences from the principle of least action, it was neces- 
sary to learn to solve problems of the calculus of variations under constraints 
more complicated then isoperimetric constraints. Specifically, one had to 
learn to solve problems subject to constraints given by differential equations. 
Let’s look at one general formulation (that goes back to Lagrange) to which 
the majority of the most interesting applied problems can be reduced. Let 
Sj = SX Vises Myr Zpr00+) 24), J=O,1,...,k, k <n, be functions 
of 2n+ 1 variables. We consider the problem 


x, , 
F=f Soler Viera sys Yi>+++,¥,)ax — min(max), 
Xo 
fi OV eee SAD, sone.) SO, 
SOD Go VAD e 50, =O 
with boundary conditions y,(X9) = Yio, ¥;(%,) =i, F=1,..., 0. This is 


the so-called Lagrange problem. How does one write down for it the necessary 
extremum conditions? 

Lagrange was convinced that this problem is also governed by the principle 
of “lifting the constraints” that we discussed in the twelfth story (the Lagrange 
principle). In accordance with Lagrange’s general conception, we must form 
a Lagrange function and set down the necessary condition for an extremum 
problem for the Lagrange function in the absence of constraints. But what 
does the Lagrange function for the Lagrange problem look like? Here, too, 
Lagrange displayed supreme decisiveness. In the finite-dimensional case, it 
is the sum of a functional multiplied by a number 4, and of products of 
the constraint functions by the Lagrange multipliers. But in the case of the 
Lagrange problem “there are as many constraints as points in the interval 
[X9. X,].” This being so, Lagrange proposed to multiply the ith equation 
G(X, V(X), 00+ ¥,(X), yi(x), Mey yi, (x)) = 0 by a function /(x) of x, 
integrate over the interval [x,, x,], and, finally, sum over /. In other words, 
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Lagrange replaced multiplication by numbers followed by summation with 
multiplication by functions followed by integration. All in all, the Lagrange 
function took the following form: 


v-{ LOE W(X) cee Wyle VM oe WO) AX, 
where 
k 
Sf =Aghy t+ Six), 
f=1 


Then Lagrange formulated the following result: If certain functions }(x) = 
(V,(x),..., ¥,(X)) yield a local minimum of the Lagrange problem, then 
there are a number A, and functions /,(x) such that Euler’s equation holds 
for the function f. (Of course, Lagrange multiplied the functional by 1 
rather than by Ap). 

Lagrange did not prove his result. Of course, a result formulated without 
any restrictions cannot be true. Thus the very method underlying his supreme 
work—his Analytical mechanics—lacked a rigorous justification. This state 
of affairs continued for over a century. A completely rigorous proof of La- 
grange’s theorem was given only at the end of the nineteenth century, and 
its essence was understood only in our own century. The meaning of “was 
understood only in our own century” deserves a special discussion that is the 
subject of the next section. 


5. Conclusion. In this section we’ll say more about infinite-dimensional 
and convex analysis, the theory of optimal control and Pontryagin’s maxi- 
mum principle, the rapid-response problem, and Newton’s problem. 

This book is meant for high school students. In Part One I refrained from 
introducing any element of mathematical analysis. In Part Two we discussed 
things not covered in school, such as functions of more than one and even of 
infinitely many variables. Nevertheless, I have stayed pretty close to the high 
school curriculum. And even in this concluding section of my concluding 
mathematical story—the fifteenth story is given over to general questions 
and free-wheeling talk—I am reluctant to abandon my approach of talking 
“to the first high school student in the street” (recall Hilbert’s words). But 
I'd like to be less constrained and, in my mind, address only such a “first 
high school student in the street” who has decided to tie his or her future to 
mathematics. I intend to speak as if I could predict the future. I count on 
the student to eventually fill in the gaps in understanding that are due to the 
limitations of his or her present knowledge. 

I will also point out and partially justify certain general theses pertaining 
to the subsequent fate of the ideas discussed earlier. They seem destined to 
evolve and also to return to their sources. 
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You will recall that the method of investigation of maximum and mini- 
mum problems was elaborated first by Fermat (for polynomials) and then, in 
general terms, by Newton and Leibniz. Immediately thereafter began the pe- 
riod of development of the classical calculus of variations, of investigation of 
extrema of certain functions of infinitely many variables. This period lasted 
for about two-and-a-half centuries. 

At the end of the nineteenth century Volterra, and somewhat later Fréchet, 
Hadamard, and many others, began to develop the foundations of infinite- 
dimensional analysis. In this connection it was stressed that one of the aims 
of the newly created calculus was the solution of maximum and minimum 
problems (recall the words of Volterra that form one of the epigraphs for 
this story). In the first half of this century, mathematical analysis in infinite- 
dimensional spaces (now called functional analysis, this chapter of analysis 
has unified various conceptions of classical analysis, higher algebra, and ge- 
ometry) experienced a period of explosive development and growth. But the 
mathematicians who continued to develop the calculus of variations at that 
time did not apply the general theorems of functional analysis to the calculus 
of variations and, furthermore, were unaware that what was being elaborated 
was an apparatus for this theory. In his textbook on the calculus of variations 
published in the 1940s, in which he summarized the whole development of 
this discipline at a time when all the necessary results of functional analysis 
were already common knowledge, George Bliss, one of this century’s fore- 
most experts on the calculus of variations, spoke in highly skeptical terms 
of the potential utility of this general approach for the subject’s “concrete” 
problems. 

At this point it is perhaps time to formulate our first thesis. 

Infinite-dimensional analysis (more precisely, the differential and integral 
calculus in infinite-dimensional spaces), a division of mathematics based on 
exactly the same ideas as finite-dimensional analysis and just as simple and 
natural as the latter, provides as natural an apparatus for the classical cal- 
culus of variations as does finite-dimensional analysis for the theory of finite- 
dimensional extremal problems. Also, the fundamental theorems of the dif- 
ferential and integral calculus in infinite-dimensional spaces are just as simple 
and natural as their finite-dimensional analogs. 

The basic concepts of the differential and integral calculus of functions of 
one variable are the derivative and the differential. We defined them first in 
the eleventh story. 

We say that a function F(x) defined on the real line R is differentiable at 
@ point xX, if there is a linear function y = kx such that F(x)+x)—F(x9) = 
kx + r(x), where lim),)_5 [r(x)|/|x| = 0. 

The linear function y = kx is called the differential of F at the point xq. 

Infinite-dimensional analysis is concerned with functions defined on spaces 
with a norm, known as normed spaces. Examples of normed spaces include 
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the spaces C([a, b]) and C'({a, b]) that we encountered earlier. More 
generally, a normed space is any set Y of elements y that can be handled 
like plane vectors, that is, that can be added and multiplied by numbers, and, 
in addition, are each assigned a number ||y|| such that 

(1) |lyl] > 0 (nonnegativity) and ||y|| =0 only if y =0, 

(2) |lay|| = |lallly|| for all ye Y and a €R (homogeneity), 

(3) lly, + Yall < ly, ll + llyzl| for all y,, y, © Y (the triangle inequality). 

Other examples of normed spaces are the line R with ||y|| = |y| and 

the n-dimensional space of vectors y = (y,,..., y,) with various norms, 
exemplified by 


2 
Iv] =Iyjl+---+ ly, 1, and Ivll= Vor te-+y%, 


and the spaces C and C ' above. 

I have already mentioned that a function K(y) is said to be /inear if 
K(y, + ¥2) = K(y,) + K(y,) forall y,, y,€Y and K(ay) =aK(y) forall 
yeyY andaeR. 

We can now give a general definition of a derivative. 

We say that a function F(y) defined on a normed space Y is differentiable 
at a point yp if there is a linear function K(y) such that 


F(yg t+ y) — F(yo) = K(y) + r(y), 


where limy,y_./7(y)I/Ilyll = 0. 

The linear function K(y) is called the differential of F at the point y,. 

Don’t you agree that this is much the same thing? That’s not all. The 
definition of a differential in infinite-dimensional analysis was given in the 
beginning of this century by the French mathematician M. Fréchet. Let’s see 
what this definition leads to in the finite-dimensional case. 

We will say that a function F(y) = F(y,,...,y,) of n variables is dif- 


ferentiable at a point yo = (Yo, ---» Yo,) if there is a linear function (recall 
that any such function has the form K(y)=k,y, +---+k,y,) such that 
Fg, + V5 +++ Yon t+ Yn) — F(Vor> +++ > Yon) = K(y) + ry), 


where limp yy. P()I/IYII — 0. As norm we can take any norm in n7- 
dimensional space. 

Nowadays this definition appears in every textbook on analysis. But 
Fréchet believed that he was the first to think of it! That’s how he puts 
it: “the differential in my sense of the terin.” Ordinary finite-dimensional 
analysis existed for two and a half centuries before him and yet this top 
mathematician of the beginning of this century thought that he was the first 
to proffer the correct definition of the fundamental concept of analysis—that 
of a differential! True, it then turned out that “his” definition had already 
been given by Weierstrass in unpublished works (written in the 1860s) and 
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is found in early twentieth-century English and German textbooks, but the 
definition was actually absent from the scientific literature. The fact remains 
that it was easier to conceive the infinite-dimensional definition than the 
finite-dimensional one! 

We go on. What are the fundamental theorems of the differential calcu- 
lus? We will begin with two, namely the chain rule and the inverse function 
theorem, 

The chain rule (for a function of one variable) states that 


(F(G(x)))' = F'(G(x))G'(x). 


When appropriately interpreted, this formula holds in infinite-dimensional 
analysis as well. It is proved with equal ease in both cases. 

The inverse function theorem (for a function of one variable) states that 
if y = f(x) is a continuously differentiable function such that f(0) = 0 and 
f'(0) #0, then / is invertible near zero, that is, for any small enough y 
there is a unique X such that y= f(x). 

How is this theorem proved? Of the many different proofs I prefer the 
one that goes back to Newton. It consists in constructing a sequence {x,},.5 
that converges stepwise to xX. This sequence is constructed according to the 
following rule: 


(1) Xe = %+(F(0) '-S(G%)), 20, 


and is represented geometrically in Figure 14.3. The zero-th approximation 
Xp is taken arbitrarily, but sufficiently close to zero. 

The same result holds in the infinite-dimensional case. I'll risk formulating 
it. Here it is not enough to have a norm. Another requirement is the so- 
called completeness property, “the absence of gaps” (the rational numbers 
don’t form a complete space because they have gaps like /2, which is not 
rational). A complete normed space is called a Banach space. 

Now let X and Y betwo Banach spaces and let F bea function that maps 
X into Y. Its derivative is a continuous linear mapping of X into Y. The 
infinite-dimensional version of our “one-dimensional statement” { (0) # 0 is 
that the derivative F’(0) is invertible—that is, F’(0)~' is also a continuous 
linear mapping of Y into X). And now the inverse function theorem takes 
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an entirely analogous form: if X and Y are two Banach spaces and y = 
F(x) is continuously differentiable function such that F(0) = 0 and F’(0) 
is invertible, then for every y close to zero there is a unique X such that 
F(X) =y. 

The proof is basically unchanged: The proof of the convergence of the ~, 
defined by formula (1) to X is entirely analogous to the one on the line R. 

In summary, the fundamental concept of a differential, the formulations 
of the basic theorems and their meaning and proofs—all these are basically 
the same in the one-dimensional and the infinite-dimensional cases. Also, 
the Lagrange principle holds in infinite-dimensional analysis. Its foundations 
are the same as in the finite-dimensional case and the relevant proofs are just 
as simple. The details follow. 

Let X and Y be Banach spaces. Let di be functionals on X, i = 
0,1,...,m, and let F be a mapping from X into Y. We consider the 
problem 


(p) f(x) > min(max), F(x)=0, Jf(x)=0, i=1,...,m. 


In the absence of the mapping F and the functions /; we obtain the 
unconstrained problem 


(p’) f(x) — min(max). 


As applied to problem (p), the Lagrange principle is formulated using the 
very same words that are used in the finite-dimensional case and in the case 
of the isoperimetric problems of the classical calculus of variations, namely: 

1. For problem (p’) the necessary extremum condition is the relation 


fal) = 0, 


(Fermat’s theorem). 

2. To solve problem (p) we must form the Lagrange function of the prob- 
lem and treat it as if the variables were independent. 

The standard form of the Lagrange function for problem (p) is 


LP H=L(X Agr eee Ags A) = AgSo(X) +220 t+ AmSin(X) + ACF (X)), 


where A),...,A,, are numbers and A(y) is a linear function on Y. 
Thus, if < is a local extremum, then there are numbers 4), ...,4,, and 
a linear function A(y) such that Fermat’s theorem holds for the Lagrange 


function: 5G 
Bx (Ao: eee Ans) = 0. 


But this is how one formulates the Lagrange multiplier rule in the finite- 
dimensional case, as witness the corresponding theorem in the twelfth story. 
The just-formulated theorem on the Lagrange multiplier rule was proved in 
1934 by the Soviet mathematician L. A. Lyusternik. 
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It must be added that in the infinite-dimenisonal case, in addition to the 
smoothness of the functions and mappings involved in the statement of the 
problem (this was the only requirement in the finite-dimensional case), there 
are two more requirements. 

The first requirement is that the spaces X and Y must be complete, 
that is, they must be Banach spaces. The second is that the mapping F 
have certain special properties usually referred to as regularity properties (a 
sufficient condition for regularity, and thus for the validity of the Lagrange 
multiplier rule, is that the derivative F’(x) maps X onto Y). The infinite- 
dimensional result is proved as simply as the finite-dimensional one. A reader 
who wishes to verify this should consult the book [2R] (in Russian) where 
both the finite-dimensional and the infinite-dimensional cases are proved. 
The proofs follow strictly parallel paths except that occasionally certain well- 
known results of advanced algebra and classical analysis are replaced by their 
generalizations in functional analysis (these generalizations are part of the 
irreducible minimum of university training in mathematics). 

What are convexity and convex analysis? We learn about convexity in high 
school geometry. Recall that a figure in the plane or in space is said to be 
convex if together with any two of its points it contains the segment joining 
them; @ function is said to be convex if its graph lies not higher than the chord 
joining any two points on that graph. Figure 11.1 shows various convex 
and nonconvex figures. A triangle is always convex. There are nonconvex 
quadrilaterals. Linear (y = ax, y = a,x, + --+4,x,) and affine (y = 
ax+b,y =a,x,+---+@,x, +) functions are convex; of the quadratic 
trinomials y = ax’ +bx+c, a # 0, only those with a > 0 are convex. 
Convex functions can be defined analytically. A function y = f(x) is said 
to be convex if and only if for any two points x, and x, and any number 
a between 0 and |, we have Jensen’s inequality 


f(ax,) + (1 —a@)x,) < af(x,) + (1 - a) f(x). 
This inequality can be extended to the case of n points: 
S (a,x, ++ -+0,%,) Sa, f(x) + --+a,S(%,) 


(provided that a, >0,--- ,a, >0, 7, @; =1). 

The following is an important convexity criterion for functions of one 
variable: if a function is twice differentiable and f(x) >0 forall x then it 
is convex. Now let’s look at the functions in the table given in the eleventh 
story. It is easy to see that the function y = |x|* is convex if and only if 
a>1,thefunction y = a‘(a>0 and ¥ 1) is always convex, and so are the 
functions y = log, x, 0<a<1,and y=-—Inx. The functions y = sinx 
and y = cosx are not convex. 

Convex figures form a rather narrow and specialized class in the totality of 
figures. A similar statement is true of convex functions. But convexity plays 
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avery important part in mathematics and in its applications. The interesting 
ideas associated with convexity, and the wealth of applications, have led to 
the creation of a chapter of mathematics called “convex analysis.” Its final 
formation took place relatively recently, about 20 years ago. In this chapter 
of mathematics one studies properties of convex sets, convex functions, and 
convex extremal figures. Of course, in this book we are primarily interested 
in convex extremal problems. 

It was precisely the abundance of convex extremal problems that inevitably 
led to a deep study of convexity and, in effect, to the creation of convex 
analysis. The number of convex problems is particularly large in economics. 
We've already talked about one economic problem, namely the transportation 
problem. Such problems arise constantly in real life. Very frequently, the 
formalization of such problems has shown that the functions to be maximized 
or minimized, as well as the functions defining the constraints (of the types of 
equality and inequality), are linear. The methods for solving these problems 
form a special chapter of convex analysis known as linear programming. 
The first papers dealing with this field are due to the Soviet mathematician, 
academician L. V. Kantorovit, winner of the Lenin and Nobel prizes. 

What is studied in convex analysis? One important area is the so-called 
“convex calculus” that has much in common with the differential calculus. 
We'll explain what the two calculi have in common. 

Not all convex functions are differentiable. An example of a convex func- 
tion that is not differentiable at zero is the function y = |x|. We have already 
seen that this function has no tangent (at zero). But every convex function 
y = f(x) of one variable has two “halftangents.” (See Figure 14.4.) 

This means that there always exist the limits (in the sequel / > 0) 


fi (XQ) =, lim (Sl%0— A) — F(X) (A), 
Ai%) = , Him (Sq + A) ~ F%))/A)- 


The segment [~ (Xo), S(%)I (which usually degenerates to a point) is called 
the subdifferential of the function f at the point x, and is denoted by 
Of(xo). If f is constant, then 0 f(x) = 0, and if f(x) = ax +b, then 
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Of(x) =a ateach point x. In general, if f(x) is differentiable at x), then 
AS (Xo) = S'(X)- 

If a differentiable function has a local minimum at a point x,, then, as 
we know, Fermat’s theorem implies the equality /” (X9) = 0. In this connec- 
tion we noted that this is a necessary but not a sufficient condition for an 
extremum. If a function is convex, then its local minimum is always global, 
or absolute. This is one of the remarkable properties of a convex function. 
A necessary and sufficient condition for minimality of a convex function at 
a point x, is the relation 0 € 0f(xX,). This relation denotes a very simple 
condition: For a convex function to have a minimum at X, it is necessary 
and sufficient that the constant function equal to /(x,) does not lie above 
the graph of /. 

The notion of a subdifferential can be extended to the case of a function 
of n variables. Unlike the derivative, a subdifferential is not a vector but a 
certain convex set of vectors. Also, certain formulas similar to the formulas 
of the differential calculus hold. One such formula is 


O(f + 8)(x) = Of (x) + g(x). 


This generalizes the formula (f + g)'(x) = f(x) + g'(x) discussed in the 
eleventh story. The formula for the subdifferential of a sum means that in 
order to find the subdifferential of the function {+g ata point x, we must 
take the sets A = Of(x) and B = 0g(x) and form the set A+B of sums 
a+0,aeEA,beB. 

The convex calculus consists of relations similar to the formula for the 
subdifferential of a sum. 

The most important idea in convex analysis is that convex sets always 
admit dual descriptions and that for each convex set there is always a “dual” 
set. For example, a plane convex figure can be described as the totality of 
its points or as the intersection of all halfplanes (halfspaces) that contain it. 
(See the description of a triangle in Figure 14.5.) Similarly, every convex 
function can be described as the function itself or as the maximum of all 
affine functions that don’t exceed it. 

The latter description brings us to one of the most important notions of 
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classical analysis, namely the Legendre transform. 

Let y = f(x) bea convex function. Its Legendre transform is the function 
y= f(z), fi(z) = max,(xz— f(x)). 

Let’s look at an example. Let S,(*) = |x|?/p, p > 1. (We recall that for 
p> 1 the function y = J, (*) is convex.) To find max(xz— f,(x)) we apply 
Fermat’s theorem. . 

We have (d/dx)(xz — f,(x)) = z- |x?‘ signx =0. Then x = |z/? ~ 
sign z. Here (p')-'+p~'=1. Also, 


Sp (z) = 2x —|xP/p = lz)? — (xlxP signx)/p 
= 2)? (1 1/p) = |z/ /p' =: f(z). 


Thus the functions (x) and Si (x) are dual to one another: each is 


the Legendre transform of the other. Also, the function /,(x) = x?/2 is 
“self-dual.” 

Here we find the reason for the Cauchy-Bunyakovskii and Hélder in- 
equalities discussed in the fifth story. Incidentally, most of the inequali- 
ties discussed in that story are related to convex analysis. For example, the 
arithmetic-geometric means inequality is an instance of the Jensen inequality. 


In fact, assume that x, >0, i=1,...,”. Then 
Inx,+ +Inx 
Se ke Ee 11 1 X, tee +X 
(x,...x,)'/" =e n Sle bie) 


We've used the Jensen inequality for the convex function y =e”. 

Here is one more extremely important thesis of convex analysis. Consider 
a triangle ABC . (See Figure 14.6.) If we erase everything but its vertices, 
we can still reconstruct the triangle. The same can be said about a square, 
a rhombus, and, quite generally, about any convex polygon. They can all be 
reconstructed from their vertices. A vertex of a polygon can be characterized 
by the fact that, unlike the other points of the polygon, it is not the midpoint 
of a segment whose endpoints belong to the polygon. 

It turns out that every bounded and closed convex set has “extremal” 
points, that is, points that are not midpoints of segments belonging to this set. 
For example, in the case of a disk, the set of extremal points coincides with its 


LATER IN THE THEORY OF EXTREMAL PROBLEMS 171 


boundary circle. Also, it turns out that every convex set can be reconstructed 
from its extremal points. 
Recall the formalization of problems of linear programming: 
QyX, + °° + A,X, 4 min(max), 
4X, +-°-+4,,x, <5,, i=1,...,m, 


x,2>0, i=1,...,m. 


In such problems the constraints form a “polyhedron” that can be recon- 
structed from its set of vertices. Also, it is easy to see that the minimum (or 
maximum) of a linear function is attained at one of its vertices. 

One of the world’s best known numerical methods is the so-called simplex 
method. It enables the user to go from one vertex to another with smaller 
(larger) value of the minimized (maximized) function. The required ex- 
tremum is found in a finite number of steps. 

The theory of convex extremal problems is called convex programming. 
The following formulation describes a rather extensive class of finite-dimen- 
sional problems of convex programming; 


(p') fo(x) — min, S(x) =0, Delica 5 
Six) <0, i=m'4+l1,...,m,  XxEA. 


Compared with the formulation in the beginning of this section we have 
the following modifications. First, this is a minimum problem; second, the 
function f, and the functions that prescribe the inequalities must be convex; 
third, the functions that prescribe the equalities must be affine; and, finally, 
there is the restriction x € A. Also, the set A is supposed to be convex. In 
this case—for convex problems—Lagrange’s words regarding his principle, 
words that serve as the epigraph for this story, are entirely correct. In fact, 
the following theorem holds: 

If an admissible point X yields an absolute minimum for the convex pro- 
gramming problem (p') then there are numbers Ags +++ +4, not all zero such 
that: (A) the nonnegativity conditions A, >0, A; >0, i> m'+1 hold; (B) 
the slack variable conditions 4,f(%)=0, i > m'+1 hold: and, finally, (C) 
we have the so-called minimum principle which states that X is a minimum 
point of the Lagrange function in the problem 


LX, Ags ---s 4m) min,  %xEA, L=Afi(x)+---+4,f,(%)- 


Ifnumbers i,,...,4,, are found with 4, = 1 satisfying the relations (A)-(C) 
then X isan absolute minimum for the problem. 

This theorem was proved rather recently, in 1951, by the American math- 
ematicians Kuhn and Tucker. It plays the role of the Lagrange multiplier 
rule for convex programming problems. 

Let’s end this story by solving Dido’s problem by means of the tools of 
convex analysis. 
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Let A, and A, be two plane convex figures. By a, A, + a,A, we denote 
the figure formed by all vectors x that are representable as x = a,x, +aX, 
with x, € A,, xX, € A,, and a, > 0, a, >0. The sum of a polygon and a 
circle is shown in Figure 14.7. 

Let S(A) denote the area of the set A. In the middle of the nineteenth 
century, the German geometer Brunn proved the following important in- 
equality: 


(1) \/S(aA, + (1 — a) Ay) > ay/S(A,) + (1 —)\/S(A,); 


then at the end of the nineteenth century H. Minkowski proved that equality 
in (1) is possible if and only if A, and A, are similar. Proofs of these facts 
and their generalizations, known as the Brunn-Minkowski inequality, can be 
found in [7R]. 

Now let A be a convex figure and B a unit circle. Then we have the 
following interesting formula (proved by Steiner): 


(2) S(A + pB) = S(A)+p(A)p+np°. 


In (2), p(A) stands for the perimeter of A. Fora convex polygon formula 
(2) is obvious. (See Figure 14.7.) In the general case it can be proved by 
passage to the limit. 

We’ll show how formulas (1) and (2) imply immediately the solution of the 
isoperimetric problem. In what follows we use the fact that p(a@A) = ap(A) 
and S(aA) = a’S(A). We have 


JS(aA + (1 —a)B) © \/S(aA) + p(aA)(1 — 0) +. 2(1 — 0)? 
= \/aS(A) + o(1 —a@)p(A) + 2(1 — a)? 
F a STA) + (1 — a) VS(B) = SA) + (1 — a) 'z. 


Squaring and eliminating like terms, we obtain the isoperimetric inequality 
p*(A) > 4nS(A) obtained in the second story. Here equality holds only if A 
is a circle. This is yet another solution of Dido’s problem. 

Now we'll go back to the eighteenth century once more. 
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At one point I stated that Lagrange formulated, actively applied and pro- 
moted his principle of lifting constraints in variational problems, but that 
he didn’t prove it. The justification of this Lagrange principle for problems 
of the classical calculus of variations fell to mathematicians of later genera- 
tions, especially those active at the end of the nineteenth century (Mayer and 
others). In this connection we note that Lagrange’s principle follows directly 
from the fundamental theorems of infinite dimensional analysis. 

How do we embed the classical calculus of variations in infinite-dimen- 
sional analysis? We adduce a special, but extraordinarily important, case of 
the general Lagrange problem. 


In» see Van Uys... U,) 


= in Sols Vy(X) a 006s V(X), U(X), ... 5 u,(%)) dx — min(max), 


i 
(By) Ve =OyE. ys eee Ygs Wyo ees), 


’ 
Vn = 9, (X, Vpseeea Vas Ujs--.,U,)s 


with boundary conditions y,(X%9) = Vig, ¥(%;) =i. P= l,...,n. 

In this formulation the unknown functions are separated into two classes. 
Some (y,(x),..., ¥,(%)) are involved in differential equations. Others 
(u,(x),...,u,(%)) can be freely chosen. They are called controls. The 
scheme of a differential equation or a system of differential equations with 
control functions describes a multitude of phenomena that allow human in- 
terference. 

If we assume (subject to the condition that ie Y,>-+-59, are continu- 
ously differentiable) that all the y’s are continuously differentiable and the 
controls u,(x),..., u,(x) are continuous, then we can consider the mapping 
that associates to a set of functions y,(x),...,y,(*), u(x), ..., u,(x) the 
set of continuous functions 


Zy(X) = V(X) — P(X Vy(X)s eee Yy(X)s My(%)s «Uy (X)). 


We write this mapping in the form z(x) = F(y,(x),...,u,(x)), where 
2(x) = z,(x),..., Z,(%). The differential equation in (p,) can be viewed 
as the equality F(y,(x),...,u,(x)) = 0. In sum, we obtain a problem of 
the form 


Ip(y,..-.» u,) + min(max), F(y,(x),..., u,(x)) =0, 


with boundary conditions y,(%) = Yio, ¥(%,) = Vy. P=l,... nN. 
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Lagrange’s idea can be applied to this problem. Here the Lagrange function 
takes the form 


2 =ig+ | (DX) — 9X1 Ips vee U,)) ee 


+ Da(X)(Vq—Oq(Xs Vp ooo U,))) ax. 
Next we must consider conceptually the problem 
LY — min(max), y,(%>) = Vig: VAX) = Vis i=1,...4 0. 


We'll deal with this problem asif there were no constraints by writing down a 
system of Euler equations. In this case, the correctness of Lagrange’s principle 
follows from the previously mentioned theorem of Lyusternik. 

Pll conclude this section with an account of problems of optimal control. 
This extremely important chapter of the theory of extremal problems was 
derived from problems with technical content. Recall the simplest problem 
of optimal control from the first story. Suppose a cart moves rectilinearly, 
without friction, on horizontal rails. (See Figure 14.8.) The cart is controlled 
by an external force that can be changed within precribed bounds, and it must 
be stopped at a specified position in a given time. This simplest problem of 
rapid response is formalized as follows. 

Let the mass of the cart be m, its initial coordinate x,, and its initial 
velocity u,. We denote the external system (the pulling force) by u, and 
the running coordinate of the cart by x(t). Then the velocity of the cart 
is u(t) = dx(t)/dt and its acceleration is a(t) = a’x(t)/dt’. We denote 
the terminal moment by 7. By Newton’s law, the external force u(t) is 
equal to m.a(t). The constraints on the force take the form of inequalities, 
u, < u(t) <u,. Thus we have the following formalization 


; ax _ dv _ - dx(0) _ 
T— min, as ant XO) =%, ap 7 U0 
x(7) = HAD) » uy SUSU). 


If there were no constraints in the form of inequalities, then the problem 
would fit within the classical calculus of variations. But the presence of 
nonstrict inequalities on the controls rules out the application of the methods 
of that calculus. 

Problems with constraints of this type are called problems of optimal con- 
trol. Their theory was elaborated by the Soviet scientist Pontryagin and 
his colleagues Boltyanskii, Gamkrelidze, and Miscenko. The fundamental 
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method of solution of such problems discovered by them is known as the 
Pontryagin maximum principle. 

What sort of thing is this? A problem of optimal control is formulated 
almost like (p,) except that there are additional constraints imposed on the 
controls. We write the latter as 


(+) (uj (x),-..,u,(x)) EU, 


where U is some fixed set of vectors (u,,..., u,). 

Let’s examine a very simple example of an optimal-control problem. Let 
(x, u) be a continuous function of two variables. We’ll consider the prob- 
lem 


x ; 

(D2) ffx, w(x))dx— min, u(x) Eu, wa], (4 my < ule) < uy). 
Xo 

We'll restrict ourselves to the following special case of this problem 


x 

i * p(x)u(2x) dx — min, -—Il <u(x) <1. 
Xo 
Here p(x) is some continuous function on the interval [x,, x,]. 

How should we proceed? It is easy to see that our integral will be least 
if we put &#(x) = —1 when p(x) > O and u#(x) = +1 when p(x) < 0, 
that is u(x) = —signp(x). In the general problem (p,) we must proceed 
in an analogous manner, namely for each x in [X,, x,] we must find the u 
in [u,,u,] for which the function f(x, u) (of u) has a minimum on this 
interval. 

This can be formulated as the following minimum principle: For a func- 
tion u(x) to be a solution of (p,), it is necessary that 

min f(x,u)= f(x, a(x)). 
uc(u, up) 

Now we can explain what the maximum principle is about. Lagrange’s 
general conception applies to the problem of optimal control except that it 
must be modified somewhat. If we are required to solve the optimal control 
problem (p,) subject to the additional constraint (*), then we must form 
the Lagrange function 2 {without reflecting in it the constraint (*)) and 
again consider conceptually the problem 


2 —min, V(X) = Yio» Y(%) = Vi» 
i=1,...,2, (4, 0005 54,,) EU. 


With respect to y we proceed as we did earlier, that is, we form the Euler 
equations. With respect to u we apply the minimum principle. Since all 
terms involving « enter the Lagrange function with a “—” sign, it is more 
convenient to write it as a maximum principle, where (,(x),.... 3, (x); 
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(x),..., &,(x)) is a solution of the problem 
iz ax ey Pi)Oi( » VAX) vars oP CX) s Bp vse ,) 
H+ DP, (X)9 (0, D(X), 6 Dyl(X), Ups ees Uy) 
= p(X)9,(X, V(x), ... , B,(X), A(X), ..., A(x) 
+> +p, (x)Q,(%, I(x), ... I,(%), (x), 2.2, @(%)). 


Now, finally, we are ready to solve Newton’s problem. 


1° Formalization. Newton’s problem is formalized as a problem of opti- 
mal control, 


4 xdx ? 
[ 7 min, y =u, u>0, 
o l+u 


with boundary conditions y(0)=0, y(a) = 


2° Necessary condition. Application of the Lagrange principle. The La- 
grange function is 


@ A,X ! 
o= | (2% +0 - 1) dx. 
o \l+u 
The necessary condition for y is Euler’s equation 
p =0=p=const. = Do: 
The necessary condition for uw is the minimality condition 


(+*) 


= u> _ Aor os u(x) 
a FOS amy, foo 

3° Discussion. If we suppose that A, = 0, then, necessarily, py # 0 
(otherwise all Lagrange multipliers would be zero). If 2) = 0 and p, #0, 
then (**) implies that # = 0, that is y(x) = {[ a(a)da = 0. Then the 
required body “has no length”—it is a flat membrane. If 5 > 0, then it must 
be assumed that A, # 0, and we can put A, = 1. Note that the case Py 29 
must also be ruled out, since in that case the function (x/(1+u 2)) — Pott is 
monotonically decreasing and we cannot have (**). 

If we study the behavior of the function (x/(1 + u’)) — Pot = v(u, x) 
with respect to uw, then it is easy to see that for py < O and small x this 
function attains its minimum for u = 0. Then the optimal control is found 
from the equation —py = 2ux/(1+ u’)? , obtained by differentiating g(u, x) 
with respect to u. The break moment ¢ is determined by the fact that the 
function g(u,¢) has two minima. 

Put differently, at the break moment the following relations (#(¢) denotes 
u(€é +0) #0) must hold: 

20(¢)¢ =. A 
eee | Teae PMs 
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From the second equation we obtain —€(E)/(1 +@()) = py(E) , whence 
Py = Sag) /(1 + a°(E)) . Substituting this relation in the first of the equations 
just set down we find that a’(é) =1-> #(€) =1, for # >0. We then find 
from that same equation that € = —2p,. 

After the break the optimal control satisfies the relation 


2,2 
_ _Pofltwvy _ Po(! 3 
x= 5 aaa yt 2utu). 
But 
dy _, W _ dy dx _ ~ = -B(e 3 
ax an” dxda da 2 5+ 2u+ 3u°). 


Integrating this relation and bearing in mind that #(¢) = 0 for #(€) =1 we 
obtain the parametric equations of Newton’s curve 


5 Pe a re eer ae a 
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Instead of investigating the simplest problem of rapid response, I will re- 
fer you to the book [1R] where this problem is solved using the Lagrange 
principle. 

It’s time to end this story now. I have kept my promise and solved all 
problems from Part One twice. Let’s take a break from formulas and have a 
chat. 

What would I tell “the first high school student in the street” about the the- 
ory of extremal problems (recall one of the epigraphs to this story)? Surely, 
something along these lines: In school you learned about functions of one 
variable. They told you about Fermat’s method of solution of extremum 
problems for such functions. But, in fact, there are very many problems 
that reduce to the minimization of functions of many variables and even 
functions of functions (say, curves), as in the case of the brachistochrone 
problem. These problems have been investigated in a chapter of mathe- 
matics called the calculus of variations. The notion of a derivative—the 
fundamental notion of high school analysis—was generalized in functional 
(infinite-dimensional) analysis, a subject that arose at the beginning of this 
century. Infinite-dimensional analysis makes possible a unified view of the 
problem of minimization of a function of one and many variables and of 
problems of the calculus of variations. 

In this most general situation Fermat’s theorem remains fully valid for 
problems without constraints: at an extremum, the derivative must be zero. 
In the case of problems of the calculus of variations, the decoded version of 
Fermat’s theorem is a differential equation known as Euler’s equation. 
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The number of problems without constraints is relatively small. A large 
part of problems with constraints can be formalized as problems with con- 
straints in the form of equalities. 

Lagrange put forward a principle for the solution of finite-dimensional 
problems with equalities. Its essence consists in the formation of the Lagrange 
function (that is, the sum of the function to be minimized and the functions 
that determine the equalities multiplied by undetermined coefficients) and in 
treating this function as if there were no constraints. (Here you could refer 
to the words of Lagrange in the epigraph to the twelfth story.) Lagrange’s 
general conception remains valid for problems of the calculus of variations, 
as well as for problems of optimal control—a new chapter of the theory of 
extremal problems. 

If my new student acquaintance showed further interest, I would tell him 
or her about the contents of Part Two of this book. 

Our next story deals with some general questions. 


The Last Story 


1S 


More Accurately, a Discussion 


All styles are fine except the boring one. 
A French saying 


It can hardly be denied that our elementary meth- 
ods are simpler and more direct than the methods 
of analysis. In general, when studying some scien- 
tific problem it is better to begin with its individual 
peculiarities than rely on general methods. 


R. Courant and M. Robbins 


At the end of the century there existed a depressing 
tendency to turn from fundamental problems in me- 
chanics as well as in pure analysis. Contrary to the 
great tradition of Jakob Bernoulli and Euler, this for- 
malism quickly established itself in the French school 
and was reflected in Analytical mechanics. 


C. Truesdell 


All styles are fine except the boring one. This saying reflects the happy and 
life-affirming spirit of the French. Anything but boring! 

In the first half of the book I tried to entertain the reader. I told fairy tales, 
parables, stories, and anecdotes, strained for variety, dished out romance, 
fable, and poetry, followed the thoughts of great men ... . 
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And in the second half everything changed, everything became mundane, 
routine, and prosaic. No stories, no poetry, no frills. Functions, deriva- 
tives, the Lagrange principle ... . Utter monotony! Formalization, neces- 
sary conditions, solution of equations—and again formalization, necessary 
conditions, and so on. What a bore! 

Recall once more one of our epigraphs: “We must make it our goal to 
find a method of solution of all problems ..._ by means of a single, simple 
method.” (Here d’Alembert had in mind the problems of dynamics.) But is 
this goal attainable? Is it not true that the truths are concealed in countless 
hiding places? Can we expect to discover them by means of a single key or 
even a small bunch of keys? 

Courant and Robbins disagree with d’Alembert; “When studying some 
scientific problem it is better to begin with its individual peculiarities than rely 
on general methods,” they say. The well-known contemporary mechanician 
Truesdell contrasts the formalism of Lagrange with the great tradition of 
Euler and openly sides with Euler (see the epigraph to this story). 

Who is right—Euler or Lagrange, d’Alembert or Truesdell? 

We will ignore the apparent obviousness of the answer and dare to ask: Is 
poetic romance better than boring monotony? 

You can’t brush these questions aside easily, especially if you learn or 
teach. How can you learn and what should you teach? The concrete or the 
abstract? Problems or general principles? 

These are the questions I propose to discuss in this “story.” 

Some of the questions posed here may seem trivial. I'll begin with the 
simplest one—the one about romance and routine. 

Romance captivates us. Weare attracted by mountains, icy deserts, stormy 
waves, danger, and risk. We revere great heroes—travellers who discover new 
lands, mountain climbers who scale inaccessible peaks, and brave and daring 
seafarers. 

There was a time when reaching the North or South Pole was an enterprise 
for heroes. Once I happened to talk to a man who reached both poles. As a 
young man he liked to travel. Then he suffered from an eye disease and could 
no longer carry heavy backpacks or make long marches on foot. It was after 
he contracted the disease that he stayed at the poles. To do this he didn’t 
have to freeze, to get over icefields, or fall through polynias.* He flew there 
in an airplane. 

To this very day you can try to reach one of the poles by yourself or with 
a group of friends, using dog teams, or skis, or in a balloon, surmounting 
difficulties, with romance and risk. 

But there is also the other way of getting there—by plane. When did easier 
travel begin? Was it not when the wheel was invented? And then came the 


*Open water in an icefield, which is usually frozen over (Tr ) 
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cart, the steam engine, the railroad, the automobile, and, finally, the plane. 
The plane allows anyone to reach the poles without poetry and romance. 

Man cannot advance without heroism, without obsessive preoccupation, 
without risk, without romance and poetry. But then, unavoidably, the time 
comes when the alluring and distant goal becomes accessible to all, when it 
has been truly mastered. Then no one does anything heroic to reach the goal. 
Someone prepares the plane for a flight and does ordinary, everyday work. 
Then comes the takeoff order. At the airport there are no escorts and no 
orchestras. There are just people doing their work. The plane takes off and 
then lands at the pole. But it is only when such a flight becomes ordinary, 
routine, or commonplace, that we can say that the pole has been mastered. 

Can anyone doubt that at some future time, when mankind will have mas- 
tered its burden of aggravating problems, it will make sure that everyone 
can stand on Mount Everest without clambering up its face and gasping for 
breath? 

Progress in life and in science combines the efforts of pioneers and the 
steady forward movement of our whole civilization. In time, this movement 
makes accessible to all people the goals that were attained earlier only by 
heroes through suffering and sacrifices. 

In science, as in life, you can pursue two different aims. You can train 
to try to scale an inaccessible peak by yourself, without special equipment. 
But you must also take part in the collective effort that secures the steady 
movement of the civilization by building roads and communication lines to 
the peaks. And therefore you should learn both approaches. 

This brings us to the topic of what and how to teach. There is learning 
for development (at preschool age), for general education (in school), and 
for professional education (at colleges and universities). Each of these stages 
must be considered separately. One could well ask: Has this not yet been 
thought through? After all, this is one of the most fundamental questions, of 
relevance to each individual and to all mankind. 

There was a time when I thought that all questions had been answered and 
that all has been known for a long time—on earth, in heaven, in science, and 
in life. After all, there has been life on earth for so long, and there have been 
sO may wise men! 

In this book I have tried to present a chapter of mathematics from its origin 
to its present state. Let’s glance once more at the past. Did this chapter begin 
a long time ago? Yes and no. Twenty five centuries is, of course, a long time. 
On the other hand, think of a selection of people, one per generation, five 
per century. This gives 125 people from the present generation to Aristotle. 
How small a number! 

We are still very young. Mankind has just begun to master the world. 
Strictly speaking, sciences in the modern sense of the word began some 300 
to 400 years ago. We already know so much, yet so little! Your grandparents 


182 MORE ACCURATELY, A DISCUSSION 


may have known people who were born in the age of the cart. Virtually 
before our eyes, in little more than a hundred years, man has mastered all 
that now fills our life—steamships, railroads, the telegraph, the telephone, 
the automobile, the plane, the TV set, artificial satellites. Mankind has not 
yet found the time to think through in detail some of the most important 
questions in life. In particular, there is no clear answer to the question of 
what to teach. But regardless of how the question of the content of education 
will be answered in the future (should one, in addition to the usual subjects, 
teach very early the handling of computers and word processors, car driving, 
shorthand and typing, editing, and so on—or should one not?), I have no 
doubt that one should teach mathematics for at least two reasons—to train 
the mind and to make possible the understanding of the structure of the 
world. 

The debate over how to teach mathematics has been alive throughout our 
century. At the beginning of this century the best mathematicians—such as 
Klein, Borel, and Hadamard—took part in this debate. In order to delineate 
one of the issues of this unceasing debate I will quote at some length from 
Dieudonné, one of the most distinguished French mathematicians of our 
time: 


Please look objectively at the following topics that take up 
most of the time in school mathematics: 


I. “Ruler and compass” constructions. 

II. Properties of “traditional” figures, such as triangles, quadrilaterals, 
circles, and systems of circles—with all the refinements accumulated 
by generations of “geometers” in search of suitable examination ques- 
tions. 

III. A whole psalter of “trigonometric formulas” and their kaleidoscopic 
transformations that make possible the finding of splendid “solutions 
of problems” on triangles and—please keep this in mind—”in a form 
convenient for taking logarithms ... ” 

Dieudonné goes on to say that no one encounters anything like this in life. 
He runs down “old school mathematics” with the same vigor and insistence: 


The question arises of whether it is more important for the 
builder to know that the altitudes of a triangle intersect in 
one point or to know the principles of the theory of strength 
of materials? 


Dieudonné thinks that one should teach principles, and only principles. 
(Read the introduction to Dieudonné’s book Linear algebra and elementary 
geometry. There you will find many other interesting things.) 

I learned “the old way” and I can add a great deal more to Dieudonné’s 
list, for example, arithmetical problems of the “pool-filling” type, endless 
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arithmetical examples involving addition of ordinary and decimal fractions, 
and so on. 

You might again think that there is nothing more to discuss, that Dieu- 
donné is obviously correct. It is indeed the case that 


Trigonometric formulas are indispensable for representatives 
of three thoroughly respectable professions: |. for astronomers; 
2. for surveyors; 3. for writers of trigonometry textbooks, 


and for no one else. Why then befuddle the poor schoolboy? 

And yet there is something that will not allow me to concede the correctness 
of these words. “Education” is a very complicated notion. It involves not 
only the acquisition of knowledge but also training in how to think. It appears 
that for two centuries (or more) pool-filling problems, construction problems, 
problems on triangles, and transformations of trigonometric formulas served 
a vital purpose—they provided food for the mind, they taught exactness and 
accuracy, they taught reasoning, the search for truth, the surmounting of 
difficulties, the trying out of different roads leading to some objective, and 
the reaching of that objective. They imparted the joy of achievement and a 
sense of beauty. In brief, they modeled creativity. With what do we replace 
all this? And is it worth it? 

It is absolutely necessary to retain these elements of creativity. The ma- 
terials may, conceivably, be changed but these elements must be retained. 
The only way to teach thinking is with concrete “special” problems and not 
with general principles alone. It seems to me that one must provide the op- 
portunity for solving Heron’s problem Heron’s way, for looking at Euclid’s 
problem with Euclid’s eyes, for experiencing the difficulty of Archimedes’ 
problem after reaching the level of his technical means, for trying to solve 
Steiner’s problem by oneself. 

I wrote Part One because I think that extremum problems provide won- 
derful material for teaching thinking, inventiveness, scientific flexibility, and 
the overcoming of intellectual difficulties. 

But I have no doubt that one must also teach the understanding of the 
essence of things, general principles and laws, both in the natural sciences and 
in life. Thus it seems to me that everyone must be familiar, at least in some 
general way, with the basics of mathematical analysis, because mathematical 
analysis is an inseparable component of the natural sciences. 

There is something else that is important. It is important to realize and 
to understand the unity and the variety of the world. Electricity, light, heat, 
fluid motion, the motions of the planets—these are all different, but they 
all have features in common. Nature is “controlled” by general laws, all is 
connected, and all gravitates to oneness. And there is yet another idea that I 
think is important, and that is to realize that there exist “general principles ” 
Such principles exist in mathematics as well. Lagrange, d’Alembert. and 
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many other great scientists tried to attain an understanding of that “very 
essence” that unifies the uncoordinated phenomena in the world. One such 
principle—Lagrange’s principle—was discussed in Part Two of this book. 

Recall that we tried to solve each problem twice, once in Part One “be- 
ginning with its individual peculiarities,” and a second time in Part Two “by 
relying on general methods” (I am quoting once more Courant and Robbins). 
Courant and Robbins arrived at the conclusion that “elementary methods are 
simpler and more direct than the methods of analysis.” Does this hold? 

It seems to me that in this dispute the general Fermat-Lagrange method 
is certainly second to none. True, it is sometimes defeated—for example, in 
the problem of the triangle of least perimeter (of course, it is possible that 
my solution is not optimal). It is also possible that some of my readers will 
declare other brilliant elementary solutions the victors. But it can hardly be 
denied that this method fought nobly, never once refused the challenge of 
single combat, and always brought us to our goal. And in some cases its 
victory was undeniable. 

We can see in all this a definite pattern that often accompanies man’s 
quests. A glimmer of truth appears in the dark, man wanders a long time, 
makes his way through an impenetrable thicket, and then it turns out that all 
the long and painful searches have been in vain and the road to the goal is a 
short one. But have the efforts really been in vain? Think about it. I wanted 
my book to give you a chance to retrace the thorny road of the search for 
truth. 

There is one more aspect of the Courant and Robbins position that is 
debatable. At the time when Courant and Robbins wrote their book, much 
was made of the contrast between elementary and higher mathematics. This 
contrast is 300 years old. The boundaries of elementary mathematics are 
determined by the proximity of mathematical analysis. All that was created 
before the birth of mathematical analysis, and a great deal of what turned 
up since then but doesn’t use its methods and constructions, is classified as 
elementary mathematics. Mathematical analysis itself is sometimes called 
“higher mathematics.” At one time it was thought that higher mathemat- 
ics contains something “supernatural,” something beyond the understanding 
of ordinary people, something truly “higher,” something that cannot be dis- 
cussed in school. This is not true at all. 

Mathematical analysis is a perfectly natural, simple, and elementary dis- 
cipline, not one iota more abstruse, complex, or “higher” than, say, “elemen- 
tary” geometry. Nowadays, insistence on the contrast between elementary 
mathematics and mathematical analysis is counterproductive. There is no 
need to display tremendous cleverness out of fear of using the properties of 
the derivative. 

Introduction of the elements of mathematical analysis into school pro- 
grams is bound to lead to a restructuring of other areas of mathematical 
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education as well. The content of competition problems, of the work of 
mathematical circles, and of mathematical olympiads is bound to change. 
By now it is impossible to ignore the fact that a high school student must 
know something of the higher mathematics to which no access was available 
earlier. In the thirteenth story, when we solved anew the problems from 
Part One and discussed certain questions that could not be accommodated 
on the elementary level, I tried to demonstrate the possibilities hidden in the 
simplest tools of mathematical analysis. 

In this connection you should bear in mind that as soon as you have mas- 
tered the very basics of mathematical analysis, you can try to approach many 
contemporary problems. In the fourteenth story I tried to illuminate the road 
traversed in the theory of extremal problems from the time of Newton, Leib- 
niz, Euler, and Lagrange to the present day. I wanted to show that, in reality, 
the distance from Newton to ourselves is not far. 

I took as the epigraph for Part One of this book the words of Bertrand 
Russell, in which he contrasted mathematics with art. Before the beginning 
of the fourth story I quoted the words of G. H. Hardy in which he puts the 
Scientist above the Poet. I won’t get into this argument. Science and art unite 
in the combined notion of the worth of man’s reason. I hope my readers will 
apprehend a small fragment of the history of mathematical science as part 
of our general cultural heritage. 
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